Saved in:
Bibliographic Details
Main Authors: Pavlich, Ryan, Ebadi, Nima, Tarbell, Richard, Linares, Billy, Tan, Adrian, Humphreys, Rachael, Das, Jayanta Kumar, Ghandiparsi, Rambod, Haley, Hannah, George, Jerris, Slavin, Rocky, Choo, Kim-Kwang Raymond, Dietrich, Glenn, Rios, Anthony
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.17574
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909231200862208
author Pavlich, Ryan
Ebadi, Nima
Tarbell, Richard
Linares, Billy
Tan, Adrian
Humphreys, Rachael
Das, Jayanta Kumar
Ghandiparsi, Rambod
Haley, Hannah
George, Jerris
Slavin, Rocky
Choo, Kim-Kwang Raymond
Dietrich, Glenn
Rios, Anthony
author_facet Pavlich, Ryan
Ebadi, Nima
Tarbell, Richard
Linares, Billy
Tan, Adrian
Humphreys, Rachael
Das, Jayanta Kumar
Ghandiparsi, Rambod
Haley, Hannah
George, Jerris
Slavin, Rocky
Choo, Kim-Kwang Raymond
Dietrich, Glenn
Rios, Anthony
contents Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. While substantial progress has been made in this field, existing research has concentrated on generating SQL statements from text queries. The broader challenge, however, lies in inferring new information about the returned data. Our research makes two major contributions to address this gap. First, we introduce a novel Internet-of-Things (IoT) text-to-SQL dataset comprising 10,985 text-SQL pairs and 239,398 rows of network traffic activity. The dataset contains additional query types limited in prior text-to-SQL datasets, notably temporal-related queries. Our dataset is sourced from a smart building's IoT ecosystem exploring sensor read and network traffic data. Second, our dataset allows two-stage processing, where the returned data (network traffic) from a generated SQL can be categorized as malicious or not. Our results show that joint training to query and infer information about the data can improve overall text-to-SQL performance, nearly matching substantially larger models. We also show that current large language models (e.g., GPT3.5) struggle to infer new information about returned data, thus our dataset provides a novel test bed for integrating complex domain-specific reasoning into LLMs.
format Preprint
id arxiv_https___arxiv_org_abs_2406_17574
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats
Pavlich, Ryan
Ebadi, Nima
Tarbell, Richard
Linares, Billy
Tan, Adrian
Humphreys, Rachael
Das, Jayanta Kumar
Ghandiparsi, Rambod
Haley, Hannah
George, Jerris
Slavin, Rocky
Choo, Kim-Kwang Raymond
Dietrich, Glenn
Rios, Anthony
Computation and Language
Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. While substantial progress has been made in this field, existing research has concentrated on generating SQL statements from text queries. The broader challenge, however, lies in inferring new information about the returned data. Our research makes two major contributions to address this gap. First, we introduce a novel Internet-of-Things (IoT) text-to-SQL dataset comprising 10,985 text-SQL pairs and 239,398 rows of network traffic activity. The dataset contains additional query types limited in prior text-to-SQL datasets, notably temporal-related queries. Our dataset is sourced from a smart building's IoT ecosystem exploring sensor read and network traffic data. Second, our dataset allows two-stage processing, where the returned data (network traffic) from a generated SQL can be categorized as malicious or not. Our results show that joint training to query and infer information about the data can improve overall text-to-SQL performance, nearly matching substantially larger models. We also show that current large language models (e.g., GPT3.5) struggle to infer new information about returned data, thus our dataset provides a novel test bed for integrating complex domain-specific reasoning into LLMs.
title Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats
topic Computation and Language
url https://arxiv.org/abs/2406.17574