Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.15732 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915166448254976 |
|---|---|
| author | Akella, Ashlesha Narayanam, Krishnasuri |
| author_facet | Akella, Ashlesha Narayanam, Krishnasuri |
| contents | Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning approaches are resource-intensive, requiring task and dataset-specific training. To overcome these shortcomings, we present an automated system that utilizes large language models to generate executable code for tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-dependent and memory-independent tasks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_15732 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Data Wrangling Task Automation Using Code-Generating Language Models Akella, Ashlesha Narayanam, Krishnasuri Machine Learning Artificial Intelligence Databases Software Engineering Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning approaches are resource-intensive, requiring task and dataset-specific training. To overcome these shortcomings, we present an automated system that utilizes large language models to generate executable code for tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-dependent and memory-independent tasks. |
| title | Data Wrangling Task Automation Using Code-Generating Language Models |
| topic | Machine Learning Artificial Intelligence Databases Software Engineering |
| url | https://arxiv.org/abs/2502.15732 |