MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Ghosh, Reshmi, Yao, Tianyi, Chen, Lizzy, Hasan, Sadid, Chen, Tianwei, Bernal, Dario, Jiao, Huitian, Hossain, H M Sajjad
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language Multiagent Systems
Accesso online:	https://arxiv.org/abs/2411.16077
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866929603795222528
author	Ghosh, Reshmi Yao, Tianyi Chen, Lizzy Hasan, Sadid Chen, Tianwei Bernal, Dario Jiao, Huitian Hossain, H M Sajjad
author_facet	Ghosh, Reshmi Yao, Tianyi Chen, Lizzy Hasan, Sadid Chen, Tianwei Bernal, Dario Jiao, Huitian Hossain, H M Sajjad
contents	Large Language Model (LLM) integrations into applications like Microsoft365 suite and Google Workspace for creating/processing documents, emails, presentations, etc. has led to considerable enhancements in productivity and time savings. But as these integrations become more more complex, it is paramount to ensure that the quality of output from the LLM-integrated applications are relevant and appropriate for use. Identifying the need to develop robust evaluation approaches for natural language generation, wherein references/ground labels doesn't exist or isn't amply available, this paper introduces a novel framework called "SAGEval" which utilizes a critiquing Agent to provide feedback on scores generated by LLM evaluators. We show that the critiquing Agent is able to rectify scores from LLM evaluators, in absence of references/ground-truth labels, thereby reducing the need for labeled data even for complex NLG evaluation scenarios, like the generation of JSON-structured forms/surveys with responses in different styles like multiple choice, likert ratings, single choice questions, etc.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_16077
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text Ghosh, Reshmi Yao, Tianyi Chen, Lizzy Hasan, Sadid Chen, Tianwei Bernal, Dario Jiao, Huitian Hossain, H M Sajjad Computation and Language Multiagent Systems Large Language Model (LLM) integrations into applications like Microsoft365 suite and Google Workspace for creating/processing documents, emails, presentations, etc. has led to considerable enhancements in productivity and time savings. But as these integrations become more more complex, it is paramount to ensure that the quality of output from the LLM-integrated applications are relevant and appropriate for use. Identifying the need to develop robust evaluation approaches for natural language generation, wherein references/ground labels doesn't exist or isn't amply available, this paper introduces a novel framework called "SAGEval" which utilizes a critiquing Agent to provide feedback on scores generated by LLM evaluators. We show that the critiquing Agent is able to rectify scores from LLM evaluators, in absence of references/ground-truth labels, thereby reducing the need for labeled data even for complex NLG evaluation scenarios, like the generation of JSON-structured forms/surveys with responses in different styles like multiple choice, likert ratings, single choice questions, etc.
title	SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text
topic	Computation and Language Multiagent Systems
url	https://arxiv.org/abs/2411.16077

Documenti analoghi