Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhang, Shenyu, Li, Yu, Wu, Rui, Huang, Xiutian, Chen, Yongrui, Xu, Wenhao, Qi, Guilin
Formato:	Preprint
Publicado:	2024
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2403.11509
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866929281219690496
author	Zhang, Shenyu Li, Yu Wu, Rui Huang, Xiutian Chen, Yongrui Xu, Wenhao Qi, Guilin
author_facet	Zhang, Shenyu Li, Yu Wu, Rui Huang, Xiutian Chen, Yongrui Xu, Wenhao Qi, Guilin
contents	Automatic methods for evaluating machine-generated texts hold significant importance due to the expanding applications of generative systems. Conventional methods tend to grapple with a lack of explainability, issuing a solitary numerical score to signify the assessment outcome. Recent advancements have sought to mitigate this limitation by incorporating large language models (LLMs) to offer more detailed error analyses, yet their applicability remains constrained, particularly in industrial contexts where comprehensive error coverage and swift detection are paramount. To alleviate these challenges, we introduce DEE, a Dual-stage Explainable Evaluation method for estimating the quality of text generation. Built upon Llama 2, DEE follows a dual-stage principle guided by stage-specific instructions to perform efficient identification of errors in generated texts in the initial stage and subsequently delves into providing comprehensive diagnostic reports in the second stage. DEE is fine-tuned on our elaborately assembled dataset AntEval, which encompasses 15K examples from 4 real-world applications of Alipay that employ generative systems. The dataset concerns newly emerged issues like hallucination and toxicity, thereby broadening the scope of DEE's evaluation criteria. Experimental results affirm that DEE's superiority over existing evaluation methods, achieving significant improvements in both human correlation as well as efficiency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_11509
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	DEE: Dual-stage Explainable Evaluation Method for Text Generation Zhang, Shenyu Li, Yu Wu, Rui Huang, Xiutian Chen, Yongrui Xu, Wenhao Qi, Guilin Computation and Language Automatic methods for evaluating machine-generated texts hold significant importance due to the expanding applications of generative systems. Conventional methods tend to grapple with a lack of explainability, issuing a solitary numerical score to signify the assessment outcome. Recent advancements have sought to mitigate this limitation by incorporating large language models (LLMs) to offer more detailed error analyses, yet their applicability remains constrained, particularly in industrial contexts where comprehensive error coverage and swift detection are paramount. To alleviate these challenges, we introduce DEE, a Dual-stage Explainable Evaluation method for estimating the quality of text generation. Built upon Llama 2, DEE follows a dual-stage principle guided by stage-specific instructions to perform efficient identification of errors in generated texts in the initial stage and subsequently delves into providing comprehensive diagnostic reports in the second stage. DEE is fine-tuned on our elaborately assembled dataset AntEval, which encompasses 15K examples from 4 real-world applications of Alipay that employ generative systems. The dataset concerns newly emerged issues like hallucination and toxicity, thereby broadening the scope of DEE's evaluation criteria. Experimental results affirm that DEE's superiority over existing evaluation methods, achieving significant improvements in both human correlation as well as efficiency.
title	DEE: Dual-stage Explainable Evaluation Method for Text Generation
topic	Computation and Language
url	https://arxiv.org/abs/2403.11509

Ejemplares similares