Saved in:
Bibliographic Details
Main Authors: Fokoue, Achille, Jayaraman, Srideepika, Khabiri, Elham, Kephart, Jeffrey O., Li, Yingjie, Shah, Dhruv, Drissi, Youssef, Heath III, Fenno F., Bhamidipaty, Anu, Tipu, Fateh A., Baseman, Robert J.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.05735
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914947087204352
author Fokoue, Achille
Jayaraman, Srideepika
Khabiri, Elham
Kephart, Jeffrey O.
Li, Yingjie
Shah, Dhruv
Drissi, Youssef
Heath III, Fenno F.
Bhamidipaty, Anu
Tipu, Fateh A.
Baseman, Robert J.
author_facet Fokoue, Achille
Jayaraman, Srideepika
Khabiri, Elham
Kephart, Jeffrey O.
Li, Yingjie
Shah, Dhruv
Drissi, Youssef
Heath III, Fenno F.
Bhamidipaty, Anu
Tipu, Fateh A.
Baseman, Robert J.
contents In many industrial settings, users wish to ask questions whose answers may be found in structured data sources such as a spreadsheets, databases, APIs, or combinations thereof. Often, the user doesn't know how to identify or access the right data source. This problem is compounded even further if multiple (and potentially siloed) data sources must be assembled to derive the answer. Recently, various Text-to-SQL applications that leverage Large Language Models (LLMs) have addressed some of these problems by enabling users to ask questions in natural language. However, these applications remain impractical in realistic industrial settings because they fail to cope with the data source heterogeneity that typifies such environments. In this paper, we address heterogeneity by introducing the siwarex platform, which enables seamless natural language access to both databases and APIs. To demonstrate the effectiveness of siwarex, we extend the popular Spider dataset and benchmark by replacing some of its tables by data retrieval APIs. We find that siwarex does a good job of coping with data source heterogeneity. Our modified Spider benchmark will soon be available to the research community
format Preprint
id arxiv_https___arxiv_org_abs_2409_05735
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle A System and Benchmark for LLM-based Q&A on Heterogeneous Data
Fokoue, Achille
Jayaraman, Srideepika
Khabiri, Elham
Kephart, Jeffrey O.
Li, Yingjie
Shah, Dhruv
Drissi, Youssef
Heath III, Fenno F.
Bhamidipaty, Anu
Tipu, Fateh A.
Baseman, Robert J.
Databases
Artificial Intelligence
In many industrial settings, users wish to ask questions whose answers may be found in structured data sources such as a spreadsheets, databases, APIs, or combinations thereof. Often, the user doesn't know how to identify or access the right data source. This problem is compounded even further if multiple (and potentially siloed) data sources must be assembled to derive the answer. Recently, various Text-to-SQL applications that leverage Large Language Models (LLMs) have addressed some of these problems by enabling users to ask questions in natural language. However, these applications remain impractical in realistic industrial settings because they fail to cope with the data source heterogeneity that typifies such environments. In this paper, we address heterogeneity by introducing the siwarex platform, which enables seamless natural language access to both databases and APIs. To demonstrate the effectiveness of siwarex, we extend the popular Spider dataset and benchmark by replacing some of its tables by data retrieval APIs. We find that siwarex does a good job of coping with data source heterogeneity. Our modified Spider benchmark will soon be available to the research community
title A System and Benchmark for LLM-based Q&A on Heterogeneous Data
topic Databases
Artificial Intelligence
url https://arxiv.org/abs/2409.05735