MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Akella, Ashlesha, Manatkar, Abhijit, Chavda, Brij, Patel, Hima
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Machine Learning Computation and Language
Accesso online:	https://arxiv.org/abs/2405.05618
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911997175529472
author	Akella, Ashlesha Manatkar, Abhijit Chavda, Brij Patel, Hima
author_facet	Akella, Ashlesha Manatkar, Abhijit Chavda, Brij Patel, Hima
contents	Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts. However, creating effective prompts for tabular datasets is challenging due to the structured nature of the data and the need to manage numerous columns. This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training. It proposes two novel methods; 1) A Reinforcement Learning-based algorithm for identifying and sequencing task-relevant columns 2) Cell-level similarity-based approach for enhancing few-shot example selection. Our approach has been extensively tested across 66 datasets, demonstrating improved performance in three downstream tasks: data imputation, error detection, and entity matching using two distinct LLMs; Google flan-t5-xxl and Mixtral 8x7B.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_05618
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	An Automatic Prompt Generation System for Tabular Data Tasks Akella, Ashlesha Manatkar, Abhijit Chavda, Brij Patel, Hima Machine Learning Computation and Language Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts. However, creating effective prompts for tabular datasets is challenging due to the structured nature of the data and the need to manage numerous columns. This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training. It proposes two novel methods; 1) A Reinforcement Learning-based algorithm for identifying and sequencing task-relevant columns 2) Cell-level similarity-based approach for enhancing few-shot example selection. Our approach has been extensively tested across 66 datasets, demonstrating improved performance in three downstream tasks: data imputation, error detection, and entity matching using two distinct LLMs; Google flan-t5-xxl and Mixtral 8x7B.
title	An Automatic Prompt Generation System for Tabular Data Tasks
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2405.05618

Documenti analoghi