Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.15985 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916408819974144 |
|---|---|
| author | Wu, Lixia Li, Peng Lou, Junhong Fu, Lei |
| author_facet | Wu, Lixia Li, Peng Lou, Junhong Fu, Lei |
| contents | In addressing the pivotal role of translating natural language queries into SQL commands, we propose a suite of compact, fine-tuned models and self-refine mechanisms to democratize data access and analysis for non-expert users, mitigating risks associated with closed-source Large Language Models. Specifically, we constructed a dataset of over 20K sample for Text-to-SQL as well as the preference dateset, to improve the efficiency in the domain of SQL generation. To further ensure code validity, a code corrector was integrated into the model. Our system, DataGpt-sql, achieved 87.2\% accuracy on the spider-dev, respectively, showcasing the effectiveness of our solution in text-to-SQL conversion tasks. Our code, data, and models are available at \url{https://github.com/CainiaoTechAi/datagpt-sql-7b} |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2409_15985 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | DataGpt-SQL-7B: An Open-Source Language Model for Text-to-SQL Wu, Lixia Li, Peng Lou, Junhong Fu, Lei Artificial Intelligence In addressing the pivotal role of translating natural language queries into SQL commands, we propose a suite of compact, fine-tuned models and self-refine mechanisms to democratize data access and analysis for non-expert users, mitigating risks associated with closed-source Large Language Models. Specifically, we constructed a dataset of over 20K sample for Text-to-SQL as well as the preference dateset, to improve the efficiency in the domain of SQL generation. To further ensure code validity, a code corrector was integrated into the model. Our system, DataGpt-sql, achieved 87.2\% accuracy on the spider-dev, respectively, showcasing the effectiveness of our solution in text-to-SQL conversion tasks. Our code, data, and models are available at \url{https://github.com/CainiaoTechAi/datagpt-sql-7b} |
| title | DataGpt-SQL-7B: An Open-Source Language Model for Text-to-SQL |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2409.15985 |