Saved in:
Bibliographic Details
Main Authors: Akella, Ashlesha, Narayanam, Krishnasuri
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.15732
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915166448254976
author Akella, Ashlesha
Narayanam, Krishnasuri
author_facet Akella, Ashlesha
Narayanam, Krishnasuri
contents Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning approaches are resource-intensive, requiring task and dataset-specific training. To overcome these shortcomings, we present an automated system that utilizes large language models to generate executable code for tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-dependent and memory-independent tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2502_15732
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Data Wrangling Task Automation Using Code-Generating Language Models
Akella, Ashlesha
Narayanam, Krishnasuri
Machine Learning
Artificial Intelligence
Databases
Software Engineering
Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning approaches are resource-intensive, requiring task and dataset-specific training. To overcome these shortcomings, we present an automated system that utilizes large language models to generate executable code for tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-dependent and memory-independent tasks.
title Data Wrangling Task Automation Using Code-Generating Language Models
topic Machine Learning
Artificial Intelligence
Databases
Software Engineering
url https://arxiv.org/abs/2502.15732