Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pradeep, Ronak, Thakur, Nandan, Upadhyay, Shivani, Campos, Daniel, Craswell, Nick, Lin, Jimmy
Format:	Preprint
Published:	2024
Subjects:	Information Retrieval Computation and Language
Online Access:	https://arxiv.org/abs/2411.09607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917837770063872
author	Pradeep, Ronak Thakur, Nandan Upadhyay, Shivani Campos, Daniel Craswell, Nick Lin, Jimmy
author_facet	Pradeep, Ronak Thakur, Nandan Upadhyay, Shivani Campos, Daniel Craswell, Nick Lin, Jimmy
contents	This report provides an initial look at partial results from the TREC 2024 Retrieval-Augmented Generation (RAG) Track. We have identified RAG evaluation as a barrier to continued progress in information access (and more broadly, natural language processing and artificial intelligence), and it is our hope that we can contribute to tackling the many challenges in this space. The central hypothesis we explore in this work is that the nugget evaluation methodology, originally developed for the TREC Question Answering Track in 2003, provides a solid foundation for evaluating RAG systems. As such, our efforts have focused on "refactoring" this methodology, specifically applying large language models to both automatically create nuggets and to automatically assign nuggets to system answers. We call this the AutoNuggetizer framework. Within the TREC setup, we are able to calibrate our fully automatic process against a manual process whereby nuggets are created by human assessors semi-manually and then assigned manually to system answers. Based on initial results across 21 topics from 45 runs, we observe a strong correlation between scores derived from a fully automatic nugget evaluation and a (mostly) manual nugget evaluation by human assessors. This suggests that our fully automatic evaluation process can be used to guide future iterations of RAG systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_09607
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework Pradeep, Ronak Thakur, Nandan Upadhyay, Shivani Campos, Daniel Craswell, Nick Lin, Jimmy Information Retrieval Computation and Language This report provides an initial look at partial results from the TREC 2024 Retrieval-Augmented Generation (RAG) Track. We have identified RAG evaluation as a barrier to continued progress in information access (and more broadly, natural language processing and artificial intelligence), and it is our hope that we can contribute to tackling the many challenges in this space. The central hypothesis we explore in this work is that the nugget evaluation methodology, originally developed for the TREC Question Answering Track in 2003, provides a solid foundation for evaluating RAG systems. As such, our efforts have focused on "refactoring" this methodology, specifically applying large language models to both automatically create nuggets and to automatically assign nuggets to system answers. We call this the AutoNuggetizer framework. Within the TREC setup, we are able to calibrate our fully automatic process against a manual process whereby nuggets are created by human assessors semi-manually and then assigned manually to system answers. Based on initial results across 21 topics from 45 runs, we observe a strong correlation between scores derived from a fully automatic nugget evaluation and a (mostly) manual nugget evaluation by human assessors. This suggests that our fully automatic evaluation process can be used to guide future iterations of RAG systems.
title	Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
topic	Information Retrieval Computation and Language
url	https://arxiv.org/abs/2411.09607

Similar Items