Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hwang, Jeongyeon, Park, Junyoung, Park, Hyejin, Kim, Dongwoo, Park, Sangdon, Ok, Jungseul
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2410.22954
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908590129807360
author	Hwang, Jeongyeon Park, Junyoung Park, Hyejin Kim, Dongwoo Park, Sangdon Ok, Jungseul
author_facet	Hwang, Jeongyeon Park, Junyoung Park, Hyejin Kim, Dongwoo Park, Sangdon Ok, Jungseul
contents	Retrieval-Augmented Generation (RAG) is an effective approach to enhance the factual accuracy of large language models (LLMs) by retrieving information from external databases, which are typically composed of diverse sources, to supplement the limited internal knowledge of LLMs. However, the standard RAG often risks retrieving incorrect information, as it relies solely on relevance between a query and a document, overlooking the heterogeneous reliability of these sources. To address this issue, we propose Reliability-Aware RAG (RA-RAG), a new multi-source RAG framework that estimates the reliability of sources and leverages this information to prioritize highly reliable and relevant documents, ensuring more robust and accurate response generation. Specifically, RA-RAG first estimates source reliability by cross-checking information across multiple sources. It then retrieves documents from the top-$κ$ reliable and relevant sources and aggregates their information using weighted majority voting (WMV), where the selective retrieval ensures scalability while not compromising the performance. Comprehensive experiments show that RA-RAG consistently outperforms baselines in scenarios with heterogeneous source reliability while scaling efficiently as the number of sources increases. Furthermore, we demonstrate the ability of RA-RAG to estimate real-world sources' reliability, highlighting its practical applicability. \jy{Our code and data are available at \href{https://github.com/ml-postech/RA-RAG}{RA-RAG}.}
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_22954
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Retrieval-Augmented Generation with Estimation of Source Reliability Hwang, Jeongyeon Park, Junyoung Park, Hyejin Kim, Dongwoo Park, Sangdon Ok, Jungseul Machine Learning Retrieval-Augmented Generation (RAG) is an effective approach to enhance the factual accuracy of large language models (LLMs) by retrieving information from external databases, which are typically composed of diverse sources, to supplement the limited internal knowledge of LLMs. However, the standard RAG often risks retrieving incorrect information, as it relies solely on relevance between a query and a document, overlooking the heterogeneous reliability of these sources. To address this issue, we propose Reliability-Aware RAG (RA-RAG), a new multi-source RAG framework that estimates the reliability of sources and leverages this information to prioritize highly reliable and relevant documents, ensuring more robust and accurate response generation. Specifically, RA-RAG first estimates source reliability by cross-checking information across multiple sources. It then retrieves documents from the top-$κ$ reliable and relevant sources and aggregates their information using weighted majority voting (WMV), where the selective retrieval ensures scalability while not compromising the performance. Comprehensive experiments show that RA-RAG consistently outperforms baselines in scenarios with heterogeneous source reliability while scaling efficiently as the number of sources increases. Furthermore, we demonstrate the ability of RA-RAG to estimate real-world sources' reliability, highlighting its practical applicability. \jy{Our code and data are available at \href{https://github.com/ml-postech/RA-RAG}{RA-RAG}.}
title	Retrieval-Augmented Generation with Estimation of Source Reliability
topic	Machine Learning
url	https://arxiv.org/abs/2410.22954

Similar Items