Saved in:
| Main Authors: | Li, Fengyu, Zhu, Junhao, Song, Kaishi, Chen, Lu, Yao, Zhongming, Li, Tianyi, Jensen, Christian S. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22721 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization
by: Zhu, Junhao, et al.
Published: (2025)
by: Zhu, Junhao, et al.
Published: (2025)
OneDB: A Distributed Multi-Metric Data Similarity Search System
by: Qian, Tang, et al.
Published: (2025)
by: Qian, Tang, et al.
Published: (2025)
In-Memory Indexing and Querying of Provenance in Data Preparation Pipelines
by: Belhajjame, Khalid, et al.
Published: (2025)
by: Belhajjame, Khalid, et al.
Published: (2025)
LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction
by: Chang, Jing, et al.
Published: (2025)
by: Chang, Jing, et al.
Published: (2025)
ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction
by: Chang, Jing, et al.
Published: (2025)
by: Chang, Jing, et al.
Published: (2025)
Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations
by: Tian, Yuan, et al.
Published: (2023)
by: Tian, Yuan, et al.
Published: (2023)
Towards Next Generation Data Engineering Pipelines
by: Kramer, Kevin M., et al.
Published: (2025)
by: Kramer, Kevin M., et al.
Published: (2025)
Credo: Declarative Control of LLM Pipelines via Beliefs and Policies
by: Lu, Duo, et al.
Published: (2026)
by: Lu, Duo, et al.
Published: (2026)
Towards Evolution Capabilities in Data Pipelines
by: Kramer, Kevin M.
Published: (2023)
by: Kramer, Kevin M.
Published: (2023)
A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks
by: Belém, Fabiano, et al.
Published: (2024)
by: Belém, Fabiano, et al.
Published: (2024)
Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence
by: Lai, Eugenie Y., et al.
Published: (2025)
by: Lai, Eugenie Y., et al.
Published: (2025)
MH-GIN: Multi-scale Heterogeneous Graph-based Imputation Network for AIS Data (Extended Version)
by: Liu, Hengyu, et al.
Published: (2025)
by: Liu, Hengyu, et al.
Published: (2025)
Step-by-Step Data Cleaning Recommendations to Improve ML Prediction Accuracy
by: Mohammed, Sedir, et al.
Published: (2025)
by: Mohammed, Sedir, et al.
Published: (2025)
AgenticScholar: Agentic Data Management with Pipeline Orchestration for Scholarly Corpora
by: Lan, Hai, et al.
Published: (2026)
by: Lan, Hai, et al.
Published: (2026)
Making Prompts First-Class Citizens for Adaptive LLM Pipelines
by: Cetintemel, Ugur, et al.
Published: (2025)
by: Cetintemel, Ugur, et al.
Published: (2025)
Accelerating Fresh Data Exploration with Fluid ETL Pipelines
by: Norfolk, Maxwell, et al.
Published: (2026)
by: Norfolk, Maxwell, et al.
Published: (2026)
Quantifying Point Contributions: A Lightweight Framework for Efficient and Effective Query-Driven Trajectory Simplification
by: Song, Yumeng, et al.
Published: (2025)
by: Song, Yumeng, et al.
Published: (2025)
Predictive Query-based Pipeline for Graph Data
by: Neto, Plácido A Souza
Published: (2024)
by: Neto, Plácido A Souza
Published: (2024)
Modeling and Monitoring of Indoor Populations using Sparse Positioning Data (Extension)
by: Li, Xiao, et al.
Published: (2024)
by: Li, Xiao, et al.
Published: (2024)
Continuous Prompts: LLM-Augmented Pipeline Processing over Unstructured Streams
by: Chen, Shu, et al.
Published: (2025)
by: Chen, Shu, et al.
Published: (2025)
KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs
by: Hofer, Marvin, et al.
Published: (2025)
by: Hofer, Marvin, et al.
Published: (2025)
Grain Theory: Type-Level Granularity Correctness in Data Pipelines
by: Karayannidis, Nikos
Published: (2026)
by: Karayannidis, Nikos
Published: (2026)
PIPE-RDF: An LLM-Assisted Pipeline for Enterprise RDF Benchmarking
by: Ranganath, Suraj
Published: (2026)
by: Ranganath, Suraj
Published: (2026)
SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines
by: Shankar, Shreya, et al.
Published: (2024)
by: Shankar, Shreya, et al.
Published: (2024)
Evaluation of Pipelines for Data Integration into Knowledge Graphs
by: Hofer, Marvin, et al.
Published: (2026)
by: Hofer, Marvin, et al.
Published: (2026)
Auto-Configuring Entity Resolution Pipelines
by: Nikoletos, Konstantinos, et al.
Published: (2025)
by: Nikoletos, Konstantinos, et al.
Published: (2025)
Spezi Data Pipeline: Streamlining FHIR-based Interoperable Digital Health Data Workflows
by: Bikia, Vasiliki, et al.
Published: (2025)
by: Bikia, Vasiliki, et al.
Published: (2025)
A Survey of Pipeline Tools for Data Engineering
by: Mbata, Anthony, et al.
Published: (2024)
by: Mbata, Anthony, et al.
Published: (2024)
ACTIVE: Continuous Similarity Search for Vessel Trajectories
by: Liu, Tiantian, et al.
Published: (2025)
by: Liu, Tiantian, et al.
Published: (2025)
QueryGym: Step-by-Step Interaction with Relational Databases
by: Ananthakrishnan, Haritha, et al.
Published: (2025)
by: Ananthakrishnan, Haritha, et al.
Published: (2025)
DeepPrep: An LLM-Powered Agentic System for Autonomous Data Preparation
by: Fan, Meihao, et al.
Published: (2026)
by: Fan, Meihao, et al.
Published: (2026)
Are Large Language Models the New Interface for Data Pipelines?
by: Junior, Sylvio Barbon, et al.
Published: (2024)
by: Junior, Sylvio Barbon, et al.
Published: (2024)
AegisTS: A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning
by: Shi, Yuhan, et al.
Published: (2026)
by: Shi, Yuhan, et al.
Published: (2026)
Not All Neighbors Matter: Understanding the Impact of Graph Sparsification on GNN Pipelines
by: Song, Yuhang, et al.
Published: (2026)
by: Song, Yuhang, et al.
Published: (2026)
Explaining Black-Box Clustering Pipelines With Cluster-Explorer
by: Ofek, Sariel, et al.
Published: (2024)
by: Ofek, Sariel, et al.
Published: (2024)
Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"
by: Grafberger, Stefan, et al.
Published: (2024)
by: Grafberger, Stefan, et al.
Published: (2024)
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
by: Jin, Tengjun, et al.
Published: (2025)
by: Jin, Tengjun, et al.
Published: (2025)
Morphing-based Compression for Data-centric ML Pipelines
by: Baunsgaard, Sebastian, et al.
Published: (2025)
by: Baunsgaard, Sebastian, et al.
Published: (2025)
SemPipes -- Optimizable Semantic Data Operators for Tabular Machine Learning Pipelines
by: Ovcharenko, Olga, et al.
Published: (2026)
by: Ovcharenko, Olga, et al.
Published: (2026)
KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
by: Lai, Eugenie, et al.
Published: (2025)
by: Lai, Eugenie, et al.
Published: (2025)
Similar Items
-
Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization
by: Zhu, Junhao, et al.
Published: (2025) -
OneDB: A Distributed Multi-Metric Data Similarity Search System
by: Qian, Tang, et al.
Published: (2025) -
In-Memory Indexing and Querying of Provenance in Data Preparation Pipelines
by: Belhajjame, Khalid, et al.
Published: (2025) -
LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction
by: Chang, Jing, et al.
Published: (2025) -
ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction
by: Chang, Jing, et al.
Published: (2025)