Saved in:
Bibliographic Details
Main Authors: Wu, Lixia, Li, Peng, Lou, Junhong, Fu, Lei
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.15985
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916408819974144
author Wu, Lixia
Li, Peng
Lou, Junhong
Fu, Lei
author_facet Wu, Lixia
Li, Peng
Lou, Junhong
Fu, Lei
contents In addressing the pivotal role of translating natural language queries into SQL commands, we propose a suite of compact, fine-tuned models and self-refine mechanisms to democratize data access and analysis for non-expert users, mitigating risks associated with closed-source Large Language Models. Specifically, we constructed a dataset of over 20K sample for Text-to-SQL as well as the preference dateset, to improve the efficiency in the domain of SQL generation. To further ensure code validity, a code corrector was integrated into the model. Our system, DataGpt-sql, achieved 87.2\% accuracy on the spider-dev, respectively, showcasing the effectiveness of our solution in text-to-SQL conversion tasks. Our code, data, and models are available at \url{https://github.com/CainiaoTechAi/datagpt-sql-7b}
format Preprint
id arxiv_https___arxiv_org_abs_2409_15985
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle DataGpt-SQL-7B: An Open-Source Language Model for Text-to-SQL
Wu, Lixia
Li, Peng
Lou, Junhong
Fu, Lei
Artificial Intelligence
In addressing the pivotal role of translating natural language queries into SQL commands, we propose a suite of compact, fine-tuned models and self-refine mechanisms to democratize data access and analysis for non-expert users, mitigating risks associated with closed-source Large Language Models. Specifically, we constructed a dataset of over 20K sample for Text-to-SQL as well as the preference dateset, to improve the efficiency in the domain of SQL generation. To further ensure code validity, a code corrector was integrated into the model. Our system, DataGpt-sql, achieved 87.2\% accuracy on the spider-dev, respectively, showcasing the effectiveness of our solution in text-to-SQL conversion tasks. Our code, data, and models are available at \url{https://github.com/CainiaoTechAi/datagpt-sql-7b}
title DataGpt-SQL-7B: An Open-Source Language Model for Text-to-SQL
topic Artificial Intelligence
url https://arxiv.org/abs/2409.15985