Saved in:
Bibliographic Details
Main Authors: Hwang, Jeongyeon, Park, Junyoung, Park, Hyejin, Kim, Dongwoo, Park, Sangdon, Ok, Jungseul
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.22954
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908590129807360
author Hwang, Jeongyeon
Park, Junyoung
Park, Hyejin
Kim, Dongwoo
Park, Sangdon
Ok, Jungseul
author_facet Hwang, Jeongyeon
Park, Junyoung
Park, Hyejin
Kim, Dongwoo
Park, Sangdon
Ok, Jungseul
contents Retrieval-Augmented Generation (RAG) is an effective approach to enhance the factual accuracy of large language models (LLMs) by retrieving information from external databases, which are typically composed of diverse sources, to supplement the limited internal knowledge of LLMs. However, the standard RAG often risks retrieving incorrect information, as it relies solely on relevance between a query and a document, overlooking the heterogeneous reliability of these sources. To address this issue, we propose Reliability-Aware RAG (RA-RAG), a new multi-source RAG framework that estimates the reliability of sources and leverages this information to prioritize highly reliable and relevant documents, ensuring more robust and accurate response generation. Specifically, RA-RAG first estimates source reliability by cross-checking information across multiple sources. It then retrieves documents from the top-$κ$ reliable and relevant sources and aggregates their information using weighted majority voting (WMV), where the selective retrieval ensures scalability while not compromising the performance. Comprehensive experiments show that RA-RAG consistently outperforms baselines in scenarios with heterogeneous source reliability while scaling efficiently as the number of sources increases. Furthermore, we demonstrate the ability of RA-RAG to estimate real-world sources' reliability, highlighting its practical applicability. \jy{Our code and data are available at \href{https://github.com/ml-postech/RA-RAG}{RA-RAG}.}
format Preprint
id arxiv_https___arxiv_org_abs_2410_22954
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Retrieval-Augmented Generation with Estimation of Source Reliability
Hwang, Jeongyeon
Park, Junyoung
Park, Hyejin
Kim, Dongwoo
Park, Sangdon
Ok, Jungseul
Machine Learning
Retrieval-Augmented Generation (RAG) is an effective approach to enhance the factual accuracy of large language models (LLMs) by retrieving information from external databases, which are typically composed of diverse sources, to supplement the limited internal knowledge of LLMs. However, the standard RAG often risks retrieving incorrect information, as it relies solely on relevance between a query and a document, overlooking the heterogeneous reliability of these sources. To address this issue, we propose Reliability-Aware RAG (RA-RAG), a new multi-source RAG framework that estimates the reliability of sources and leverages this information to prioritize highly reliable and relevant documents, ensuring more robust and accurate response generation. Specifically, RA-RAG first estimates source reliability by cross-checking information across multiple sources. It then retrieves documents from the top-$κ$ reliable and relevant sources and aggregates their information using weighted majority voting (WMV), where the selective retrieval ensures scalability while not compromising the performance. Comprehensive experiments show that RA-RAG consistently outperforms baselines in scenarios with heterogeneous source reliability while scaling efficiently as the number of sources increases. Furthermore, we demonstrate the ability of RA-RAG to estimate real-world sources' reliability, highlighting its practical applicability. \jy{Our code and data are available at \href{https://github.com/ml-postech/RA-RAG}{RA-RAG}.}
title Retrieval-Augmented Generation with Estimation of Source Reliability
topic Machine Learning
url https://arxiv.org/abs/2410.22954