Enregistré dans:
Détails bibliographiques
Auteurs principaux: Wu, Di, Gu, Jia-Chen, Chang, Kai-Wei, Peng, Nanyun
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2504.01018
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866911356618276864
author Wu, Di
Gu, Jia-Chen
Chang, Kai-Wei
Peng, Nanyun
author_facet Wu, Di
Gu, Jia-Chen
Chang, Kai-Wei
Peng, Nanyun
contents Selective retrieval aims to make retrieval-augmented generation (RAG) more efficient and reliable by skipping retrieval when an LLM's parametric knowledge suffices. Despite promising results, existing methods are constrained by a binary design choice: either retrieve from a single external source or skip retrieval and let the LLM directly produce the final answer. We argue that this fallback underestimates the model's knowledge and obscures the more general multi-source decision problem that arises in practical systems. We propose Self-Routing RAG (SR-RAG), which casts selective retrieval as knowledge source selection and treats the LLM itself as a first-class knowledge source. SR-RAG learns to select an appropriate knowledge source, optionally verbalize parametric knowledge, and answer using the selected source, all within a single left-to-right generation pass. SR-RAG further augments source selection by combining LLM-based uncertainty with a flexible external policy datastore to improve decision calibration. Across four benchmarks and three 7B-class LLMs, SR-RAG outperforms a strong selective retrieval baseline by 8.5%/2.1%/4.7% while performing 26%/40%/21% fewer retrievals, and it achieves favorable accuracy-latency trade-offs without dataset-specific threshold tuning.
format Preprint
id arxiv_https___arxiv_org_abs_2504_01018
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization
Wu, Di
Gu, Jia-Chen
Chang, Kai-Wei
Peng, Nanyun
Computation and Language
Selective retrieval aims to make retrieval-augmented generation (RAG) more efficient and reliable by skipping retrieval when an LLM's parametric knowledge suffices. Despite promising results, existing methods are constrained by a binary design choice: either retrieve from a single external source or skip retrieval and let the LLM directly produce the final answer. We argue that this fallback underestimates the model's knowledge and obscures the more general multi-source decision problem that arises in practical systems. We propose Self-Routing RAG (SR-RAG), which casts selective retrieval as knowledge source selection and treats the LLM itself as a first-class knowledge source. SR-RAG learns to select an appropriate knowledge source, optionally verbalize parametric knowledge, and answer using the selected source, all within a single left-to-right generation pass. SR-RAG further augments source selection by combining LLM-based uncertainty with a flexible external policy datastore to improve decision calibration. Across four benchmarks and three 7B-class LLMs, SR-RAG outperforms a strong selective retrieval baseline by 8.5%/2.1%/4.7% while performing 26%/40%/21% fewer retrievals, and it achieves favorable accuracy-latency trade-offs without dataset-specific threshold tuning.
title Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization
topic Computation and Language
url https://arxiv.org/abs/2504.01018