Saved in:
Bibliographic Details
Main Authors: Tian, Songsong, Zhuo, Kongsheng, Wang, Zhendong, Shen, Rong, Zhang, Shengtao, Wu, Yong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.10318
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917205524873216
author Tian, Songsong
Zhuo, Kongsheng
Wang, Zhendong
Shen, Rong
Zhang, Shengtao
Wu, Yong
author_facet Tian, Songsong
Zhuo, Kongsheng
Wang, Zhendong
Shen, Rong
Zhang, Shengtao
Wu, Yong
contents In this paper, we present BAR-SQL (Boundary-Aware Reliable NL2SQL), a unified training framework that embeds reliability and boundary awareness directly into the generation process. We introduce a Seed Mutation data synthesis paradigm that constructs a representative enterprise corpus, explicitly encompassing multi-step analytical queries alongside boundary cases including ambiguity and schema limitations. To ensure interpretability, we employ Knowledge-Grounded Reasoning Synthesis, which produces Chain-of-Thought traces explicitly anchored in schema metadata and business rules. The model is trained through a two-stage process: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning via Group Relative Policy Optimization. We design a Task-Conditioned Hybrid Reward mechanism that simultaneously optimizes SQL execution accuracy-leveraging Abstract Syntax Tree analysis and dense result matching-and semantic precision in abstention responses. To evaluate reliability alongside generation accuracy, we construct and release Ent-SQL-Bench, which jointly assesse SQL precision and boundary-aware abstention across ambiguous and unanswerable queries. Experimental results on this benchmark demonstrate that BAR-SQL achieves 91.48% average accuracy, outperforming leading proprietary models, including Claude 4.5 Sonnet and GPT-5, in both SQL generation quality and boundary-aware abstention capability. The source code and benchmark are available anonymously at: https://github.com/TianSongS/BAR-SQL.
format Preprint
id arxiv_https___arxiv_org_abs_2601_10318
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Boundary-Aware NL2SQL: Integrating Reliability through Hybrid Reward and Data Synthesis
Tian, Songsong
Zhuo, Kongsheng
Wang, Zhendong
Shen, Rong
Zhang, Shengtao
Wu, Yong
Computation and Language
In this paper, we present BAR-SQL (Boundary-Aware Reliable NL2SQL), a unified training framework that embeds reliability and boundary awareness directly into the generation process. We introduce a Seed Mutation data synthesis paradigm that constructs a representative enterprise corpus, explicitly encompassing multi-step analytical queries alongside boundary cases including ambiguity and schema limitations. To ensure interpretability, we employ Knowledge-Grounded Reasoning Synthesis, which produces Chain-of-Thought traces explicitly anchored in schema metadata and business rules. The model is trained through a two-stage process: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning via Group Relative Policy Optimization. We design a Task-Conditioned Hybrid Reward mechanism that simultaneously optimizes SQL execution accuracy-leveraging Abstract Syntax Tree analysis and dense result matching-and semantic precision in abstention responses. To evaluate reliability alongside generation accuracy, we construct and release Ent-SQL-Bench, which jointly assesse SQL precision and boundary-aware abstention across ambiguous and unanswerable queries. Experimental results on this benchmark demonstrate that BAR-SQL achieves 91.48% average accuracy, outperforming leading proprietary models, including Claude 4.5 Sonnet and GPT-5, in both SQL generation quality and boundary-aware abstention capability. The source code and benchmark are available anonymously at: https://github.com/TianSongS/BAR-SQL.
title Boundary-Aware NL2SQL: Integrating Reliability through Hybrid Reward and Data Synthesis
topic Computation and Language
url https://arxiv.org/abs/2601.10318