Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sarwar, Tabinda, Moghimifar, Farhad, Hoang, Cong Duy Vu, Ma, Xiaoxiao, Xu, Shawn Chang, Saleh, Fahimeh, Zaremoodi, Poorya, Sil, Avirup, Kirchhoff, Katrin
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.22313
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911621294587904
author	Sarwar, Tabinda Moghimifar, Farhad Hoang, Cong Duy Vu Ma, Xiaoxiao Xu, Shawn Chang Saleh, Fahimeh Zaremoodi, Poorya Sil, Avirup Kirchhoff, Katrin
author_facet	Sarwar, Tabinda Moghimifar, Farhad Hoang, Cong Duy Vu Ma, Xiaoxiao Xu, Shawn Chang Saleh, Fahimeh Zaremoodi, Poorya Sil, Avirup Kirchhoff, Katrin
contents	NL2SQL systems deployed in industry settings often encounter ambiguous or unanswerable queries, particularly in interactive scenarios with incomplete user clarification. Existing benchmarks typically assume a single source of ambiguity and rely on user interaction for resolution, overlooking realistic failure modes. We introduce Clarity, a framework for automatically generating an NL2SQL benchmark with multi-faceted ambiguities and diverse user behaviors across both single- and multi-turn settings. Using a constraint-driven pipeline, Clarity transforms executable SQL into ambiguous queries, augmented with grounded conversational continuations and schema-level metadata. Empirical evaluation on Spider and BIRD shows that leading NL2SQL systems, including those based on strong LLMs, suffer significant performance degradation under multi-faceted ambiguity. While these systems often detect ambiguity, they struggle to accurately localize and resolve the underlying schema-level sources. Our results highlight the need for more robust ambiguity detection and resolution in industry-grade NL2SQL systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_22313
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems Sarwar, Tabinda Moghimifar, Farhad Hoang, Cong Duy Vu Ma, Xiaoxiao Xu, Shawn Chang Saleh, Fahimeh Zaremoodi, Poorya Sil, Avirup Kirchhoff, Katrin Computation and Language NL2SQL systems deployed in industry settings often encounter ambiguous or unanswerable queries, particularly in interactive scenarios with incomplete user clarification. Existing benchmarks typically assume a single source of ambiguity and rely on user interaction for resolution, overlooking realistic failure modes. We introduce Clarity, a framework for automatically generating an NL2SQL benchmark with multi-faceted ambiguities and diverse user behaviors across both single- and multi-turn settings. Using a constraint-driven pipeline, Clarity transforms executable SQL into ambiguous queries, augmented with grounded conversational continuations and schema-level metadata. Empirical evaluation on Spider and BIRD shows that leading NL2SQL systems, including those based on strong LLMs, suffer significant performance degradation under multi-faceted ambiguity. While these systems often detect ambiguity, they struggle to accurately localize and resolve the underlying schema-level sources. Our results highlight the need for more robust ambiguity detection and resolution in industry-grade NL2SQL systems.
title	CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems
topic	Computation and Language
url	https://arxiv.org/abs/2604.22313

Similar Items