Visualització del personal: :: Library Catalog

Guardat en:

Dades bibliogràfiques
Autors principals:	Aktaş, Burak, Baytekin, Mehmet Can, Köse, Süha Kağan, İlbilgi, Ömer, Yılmaz, Elif Özge, Toraman, Çağrı, Görür, Bilge Kaan
Format:	Preprint
Publicat:	2026
Matèries:	Computation and Language Artificial Intelligence Databases
Accés en línia:	https://arxiv.org/abs/2602.03633
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

_version_	1866914304420216832
author	Aktaş, Burak Baytekin, Mehmet Can Köse, Süha Kağan İlbilgi, Ömer Yılmaz, Elif Özge Toraman, Çağrı Görür, Bilge Kaan
author_facet	Aktaş, Burak Baytekin, Mehmet Can Köse, Süha Kağan İlbilgi, Ömer Yılmaz, Elif Özge Toraman, Çağrı Görür, Bilge Kaan
contents	Text-to-SQL systems have achieved strong performance on English benchmarks, yet their behavior in morphologically rich, low-resource languages remains largely unexplored. We introduce BIRDTurk, the first Turkish adaptation of the BIRD benchmark, constructed through a controlled translation pipeline that adapts schema identifiers to Turkish while strictly preserving the logical structure and execution semantics of SQL queries and databases. Translation quality is validated on a sample size determined by the Central Limit Theorem to ensure 95% confidence, achieving 98.15% accuracy on human-evaluated samples. Using BIRDTurk, we evaluate inference-based prompting, agentic multi-stage reasoning, and supervised fine-tuning. Our results reveal that Turkish introduces consistent performance degradation, driven by both structural linguistic divergence and underrepresentation in LLM pretraining, while agentic reasoning demonstrates stronger cross-lingual robustness. Supervised fine-tuning remains challenging for standard multilingual baselines but scales effectively with modern instruction-tuned models. BIRDTurk provides a controlled testbed for cross-lingual Text-to-SQL evaluation under realistic database conditions. We release the training and development splits to support future research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_03633
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish Aktaş, Burak Baytekin, Mehmet Can Köse, Süha Kağan İlbilgi, Ömer Yılmaz, Elif Özge Toraman, Çağrı Görür, Bilge Kaan Computation and Language Artificial Intelligence Databases Text-to-SQL systems have achieved strong performance on English benchmarks, yet their behavior in morphologically rich, low-resource languages remains largely unexplored. We introduce BIRDTurk, the first Turkish adaptation of the BIRD benchmark, constructed through a controlled translation pipeline that adapts schema identifiers to Turkish while strictly preserving the logical structure and execution semantics of SQL queries and databases. Translation quality is validated on a sample size determined by the Central Limit Theorem to ensure 95% confidence, achieving 98.15% accuracy on human-evaluated samples. Using BIRDTurk, we evaluate inference-based prompting, agentic multi-stage reasoning, and supervised fine-tuning. Our results reveal that Turkish introduces consistent performance degradation, driven by both structural linguistic divergence and underrepresentation in LLM pretraining, while agentic reasoning demonstrates stronger cross-lingual robustness. Supervised fine-tuning remains challenging for standard multilingual baselines but scales effectively with modern instruction-tuned models. BIRDTurk provides a controlled testbed for cross-lingual Text-to-SQL evaluation under realistic database conditions. We release the training and development splits to support future research.
title	BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish
topic	Computation and Language Artificial Intelligence Databases
url	https://arxiv.org/abs/2602.03633

Ítems similars