Saved in:
Bibliographic Details
Main Authors: Sanmartino, Gabriele, Urban, Matthias, Papotti, Paolo, Binnig, Carsten
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.04430
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915779586293760
author Sanmartino, Gabriele
Urban, Matthias
Papotti, Paolo
Binnig, Carsten
author_facet Sanmartino, Gabriele
Urban, Matthias
Papotti, Paolo
Binnig, Carsten
contents LLM-augmented data systems enable semantic querying over structured and unstructured data, but executing queries with LLM-powered operators introduces a fundamental runtime-accuracy trade-off. In this paper, we present Stretto, a new execution engine that provides end-to-end query guarantees while efficiently navigating this trade-off in a holistic manner. For this, Stretto formulates query planning as a constrained optimization problem and uses a gradient-based optimizer to jointly select operator implementations and allocate error budgets across pipelines. Moreover, to enable fine-grained execution choices, Stretto introduces a novel idea on how KV-caching can be used to realize a spectrum of different physical operators that transform a sparse design space into a dense continuum of runtime-accuracy trade-offs. Experiments show that Stretto outperforms state-of-the-art systems while consistently meeting quality guarantees.
format Preprint
id arxiv_https___arxiv_org_abs_2602_04430
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle The Stretto Execution Engine for LLM-Augmented Data Systems
Sanmartino, Gabriele
Urban, Matthias
Papotti, Paolo
Binnig, Carsten
Databases
LLM-augmented data systems enable semantic querying over structured and unstructured data, but executing queries with LLM-powered operators introduces a fundamental runtime-accuracy trade-off. In this paper, we present Stretto, a new execution engine that provides end-to-end query guarantees while efficiently navigating this trade-off in a holistic manner. For this, Stretto formulates query planning as a constrained optimization problem and uses a gradient-based optimizer to jointly select operator implementations and allocate error budgets across pipelines. Moreover, to enable fine-grained execution choices, Stretto introduces a novel idea on how KV-caching can be used to realize a spectrum of different physical operators that transform a sparse design space into a dense continuum of runtime-accuracy trade-offs. Experiments show that Stretto outperforms state-of-the-art systems while consistently meeting quality guarantees.
title The Stretto Execution Engine for LLM-Augmented Data Systems
topic Databases
url https://arxiv.org/abs/2602.04430