:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shankar, Shreya, Li, Haotian, Asawa, Parth, Hulsebos, Madelon, Lin, Yiming, Zamfirescu-Pereira, J. D., Chase, Harrison, Fu-Hinthorn, Will, Parameswaran, Aditya G., Wu, Eugene
Format:	Preprint
Published:	2024
Subjects:	Databases Software Engineering
Online Access:	https://arxiv.org/abs/2401.03038
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
by: Vir, Reya, et al.
Published: (2025)

Towards Accurate and Efficient Document Analytics with Large Language Models
by: Lin, Yiming, et al.
Published: (2024)

TARGET: Benchmarking Table Retrieval for Generative Tasks
by: Ji, Xingyu, et al.
Published: (2025)

LLM-Powered Proactive Data Systems
by: Zeighami, Sepanta, et al.
Published: (2025)

Task Cascades for Efficient Unstructured Data Processing
by: Shankar, Shreya, et al.
Published: (2026)

Featurized-Decomposition Join: Low-Cost Semantic Joins with Guarantees
by: Zeighami, Sepanta, et al.
Published: (2025)

Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees
by: Zeighami, Sepanta, et al.
Published: (2025)

Towards Contextual Sensitive Data Detection
by: Telkamp, Liang, et al.
Published: (2025)

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
by: Shankar, Shreya, et al.
Published: (2024)

Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis
by: Gomm, Daniel, et al.
Published: (2025)

Rethinking Dataset Discovery with DataScout
by: Lin, Rachel, et al.
Published: (2025)

Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle
by: Garcia, Rolando, et al.
Published: (2024)

Testing Database Systems with Large Language Model Synthesized Fragments
by: Zhong, Suyang, et al.
Published: (2025)

Semantic Data Processing with Holistic Data Understanding
by: Sun, Youran, et al.
Published: (2026)

Observatory: Characterizing Embeddings of Relational Tables
by: Cong, Tianji, et al.
Published: (2023)

Fine-Grained Table Retrieval Through the Lens of Complex Queries
by: Kosiuk, Wojciech, et al.
Published: (2026)

DIRT: Database-Integrated Random Testing
by: Keles, Alperen, et al.
Published: (2026)

Multi-Objective Agentic Rewrites for Unstructured Data Processing
by: Wei, Lindsey Linxi, et al.
Published: (2025)

Prompt Migration: Stabilizing GenAI Applications with Evolving Large Language Models
by: Tripathi, Shivani, et al.
Published: (2025)

CUBES: A Parallel Synthesizer for SQL Using Examples
by: Brancas, Ricardo, et al.
Published: (2022)

Synthesizing Document Database Queries using Collection Abstractions
by: Liu, Qikang, et al.
Published: (2024)

Steering Semantic Data Processing With DocWrangler
by: Shankar, Shreya, et al.
Published: (2025)

AI-Assisted SQL Authoring at Industry Scale
by: Maddila, Chandra, et al.
Published: (2024)

Quality Assessment of Tabular Data using Large Language Models and Code Generation
by: Akella, Ashlesha, et al.
Published: (2025)

Can AI Agents Answer Your Data Questions? A Benchmark for Data Agents
by: Ma, Ruiying, et al.
Published: (2026)

The Time is Here for Just-in-Time Systems: Challenges and Opportunities
by: Liu, Shu, et al.
Published: (2026)

Process Modeling With Large Language Models
by: Kourani, Humam, et al.
Published: (2024)

Object Graph Programming
by: Thimmaiah, Aditya, et al.
Published: (2024)

Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement
by: Sarker, Shouvon, et al.
Published: (2024)

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
by: Shankar, Shreya, et al.
Published: (2024)

Adaptive Data Quality Scoring Operations Framework using Drift-Aware Mechanism for Industrial Applications
by: Bayram, Firas, et al.
Published: (2024)

Liberal Entity Matching as a Compound AI Toolchain
by: Fu, Silvery D., et al.
Published: (2024)

Automated Tensor-Relational Decomposition for Large-Scale Sparse Tensor Computation
by: Tang, Yuxin, et al.
Published: (2026)

How well do LLMs reason over tabular data, really?
by: Wolff, Cornelius, et al.
Published: (2025)

GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models
by: Hou, Shuyang, et al.
Published: (2024)

AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation
by: Pulavarthi, Vaishnavi, et al.
Published: (2024)

Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Logs
by: Rebmann, Adrian, et al.
Published: (2024)

Dinkel: State-Aware and Granular Framework for Validating Graph Databases
by: Wüst, Celine, et al.
Published: (2024)

Vextra: A Unified Middleware Abstraction for Heterogeneous Vector Database Systems
by: Suri, Chandan, et al.
Published: (2026)

A Practical Framework for Flaky Failure Triage in Distributed Database Continuous Integration
by: Zhu, Jun-Peng, et al.
Published: (2026)