Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Kanyuka, Andriy, Mahfoud, Elias
Formato:	Preprint
Publicado:	2024
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2406.10442
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866909224604270592
author	Kanyuka, Andriy Mahfoud, Elias
author_facet	Kanyuka, Andriy Mahfoud, Elias
contents	The generation of structured data in formats such as JSON, YAML and XML is a critical task in Generative AI (GenAI) applications. These formats, while widely used, contain many redundant constructs that lead to inflated token usage. This inefficiency is particularly evident when employing large language models (LLMs) like GPT-4, where generating extensive structured data incurs increased latency and operational costs. We introduce a domain-specific shorthand (DSS) format, underpinned by a context-free grammar (CFG), and demonstrate its usage to reduce the number of tokens required for structured data generation. The method involves creating a shorthand notation that captures essential elements of the output schema with fewer tokens, ensuring it can be unambiguously converted to and from its verbose form. It employs a CFG to facilitate efficient shorthand generation by the LLM, and to create parsers to translate the shorthand back into standard structured formats. The application of our approach to data visualization with LLMs demonstrates a significant (3x to 5x) reduction in generated tokens, leading to significantly lower latency and cost. This paper outlines the development of the DSS and the accompanying CFG, and the implications of this approach for GenAI applications, presenting a scalable solution to the token inefficiency problem in structured data generation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_10442
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Domain-Specific Shorthand for Generation Based on Context-Free Grammar Kanyuka, Andriy Mahfoud, Elias Computation and Language The generation of structured data in formats such as JSON, YAML and XML is a critical task in Generative AI (GenAI) applications. These formats, while widely used, contain many redundant constructs that lead to inflated token usage. This inefficiency is particularly evident when employing large language models (LLMs) like GPT-4, where generating extensive structured data incurs increased latency and operational costs. We introduce a domain-specific shorthand (DSS) format, underpinned by a context-free grammar (CFG), and demonstrate its usage to reduce the number of tokens required for structured data generation. The method involves creating a shorthand notation that captures essential elements of the output schema with fewer tokens, ensuring it can be unambiguously converted to and from its verbose form. It employs a CFG to facilitate efficient shorthand generation by the LLM, and to create parsers to translate the shorthand back into standard structured formats. The application of our approach to data visualization with LLMs demonstrates a significant (3x to 5x) reduction in generated tokens, leading to significantly lower latency and cost. This paper outlines the development of the DSS and the accompanying CFG, and the implications of this approach for GenAI applications, presenting a scalable solution to the token inefficiency problem in structured data generation.
title	Domain-Specific Shorthand for Generation Based on Context-Free Grammar
topic	Computation and Language
url	https://arxiv.org/abs/2406.10442

Ejemplares similares