Salvato in:
Dettagli Bibliografici
Autori principali: Akella, Ashlesha, Manatkar, Abhijit, Chavda, Brij, Patel, Hima
Natura: Preprint
Pubblicazione: 2024
Soggetti:
Accesso online:https://arxiv.org/abs/2405.05618
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866911997175529472
author Akella, Ashlesha
Manatkar, Abhijit
Chavda, Brij
Patel, Hima
author_facet Akella, Ashlesha
Manatkar, Abhijit
Chavda, Brij
Patel, Hima
contents Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts. However, creating effective prompts for tabular datasets is challenging due to the structured nature of the data and the need to manage numerous columns. This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training. It proposes two novel methods; 1) A Reinforcement Learning-based algorithm for identifying and sequencing task-relevant columns 2) Cell-level similarity-based approach for enhancing few-shot example selection. Our approach has been extensively tested across 66 datasets, demonstrating improved performance in three downstream tasks: data imputation, error detection, and entity matching using two distinct LLMs; Google flan-t5-xxl and Mixtral 8x7B.
format Preprint
id arxiv_https___arxiv_org_abs_2405_05618
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle An Automatic Prompt Generation System for Tabular Data Tasks
Akella, Ashlesha
Manatkar, Abhijit
Chavda, Brij
Patel, Hima
Machine Learning
Computation and Language
Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts. However, creating effective prompts for tabular datasets is challenging due to the structured nature of the data and the need to manage numerous columns. This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training. It proposes two novel methods; 1) A Reinforcement Learning-based algorithm for identifying and sequencing task-relevant columns 2) Cell-level similarity-based approach for enhancing few-shot example selection. Our approach has been extensively tested across 66 datasets, demonstrating improved performance in three downstream tasks: data imputation, error detection, and entity matching using two distinct LLMs; Google flan-t5-xxl and Mixtral 8x7B.
title An Automatic Prompt Generation System for Tabular Data Tasks
topic Machine Learning
Computation and Language
url https://arxiv.org/abs/2405.05618