Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fokoue, Achille, Jayaraman, Srideepika, Khabiri, Elham, Kephart, Jeffrey O., Li, Yingjie, Shah, Dhruv, Drissi, Youssef, Heath III, Fenno F., Bhamidipaty, Anu, Tipu, Fateh A., Baseman, Robert J.
Format:	Preprint
Published:	2024
Subjects:	Databases Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.05735
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914947087204352
author	Fokoue, Achille Jayaraman, Srideepika Khabiri, Elham Kephart, Jeffrey O. Li, Yingjie Shah, Dhruv Drissi, Youssef Heath III, Fenno F. Bhamidipaty, Anu Tipu, Fateh A. Baseman, Robert J.
author_facet	Fokoue, Achille Jayaraman, Srideepika Khabiri, Elham Kephart, Jeffrey O. Li, Yingjie Shah, Dhruv Drissi, Youssef Heath III, Fenno F. Bhamidipaty, Anu Tipu, Fateh A. Baseman, Robert J.
contents	In many industrial settings, users wish to ask questions whose answers may be found in structured data sources such as a spreadsheets, databases, APIs, or combinations thereof. Often, the user doesn't know how to identify or access the right data source. This problem is compounded even further if multiple (and potentially siloed) data sources must be assembled to derive the answer. Recently, various Text-to-SQL applications that leverage Large Language Models (LLMs) have addressed some of these problems by enabling users to ask questions in natural language. However, these applications remain impractical in realistic industrial settings because they fail to cope with the data source heterogeneity that typifies such environments. In this paper, we address heterogeneity by introducing the siwarex platform, which enables seamless natural language access to both databases and APIs. To demonstrate the effectiveness of siwarex, we extend the popular Spider dataset and benchmark by replacing some of its tables by data retrieval APIs. We find that siwarex does a good job of coping with data source heterogeneity. Our modified Spider benchmark will soon be available to the research community
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_05735
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A System and Benchmark for LLM-based Q&A on Heterogeneous Data Fokoue, Achille Jayaraman, Srideepika Khabiri, Elham Kephart, Jeffrey O. Li, Yingjie Shah, Dhruv Drissi, Youssef Heath III, Fenno F. Bhamidipaty, Anu Tipu, Fateh A. Baseman, Robert J. Databases Artificial Intelligence In many industrial settings, users wish to ask questions whose answers may be found in structured data sources such as a spreadsheets, databases, APIs, or combinations thereof. Often, the user doesn't know how to identify or access the right data source. This problem is compounded even further if multiple (and potentially siloed) data sources must be assembled to derive the answer. Recently, various Text-to-SQL applications that leverage Large Language Models (LLMs) have addressed some of these problems by enabling users to ask questions in natural language. However, these applications remain impractical in realistic industrial settings because they fail to cope with the data source heterogeneity that typifies such environments. In this paper, we address heterogeneity by introducing the siwarex platform, which enables seamless natural language access to both databases and APIs. To demonstrate the effectiveness of siwarex, we extend the popular Spider dataset and benchmark by replacing some of its tables by data retrieval APIs. We find that siwarex does a good job of coping with data source heterogeneity. Our modified Spider benchmark will soon be available to the research community
title	A System and Benchmark for LLM-based Q&A on Heterogeneous Data
topic	Databases Artificial Intelligence
url	https://arxiv.org/abs/2409.05735

Similar Items