Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yicheng, Qin, Zhen, Wu, Zhaomin, Zhang, Wenqi, Deng, Shuiguang
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.03645
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914304425459712
author	Zhang, Yicheng Qin, Zhen Wu, Zhaomin Zhang, Wenqi Deng, Shuiguang
author_facet	Zhang, Yicheng Qin, Zhen Wu, Zhaomin Zhang, Wenqi Deng, Shuiguang
contents	Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we incorporate retrieval history into the state at each retrieval step to mitigate state aliasing. Extensive experiments across diverse RAG pipelines, datasets, and retriever scales demonstrate consistent improvements of our approach in RAG performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_03645
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG Zhang, Yicheng Qin, Zhen Wu, Zhaomin Zhang, Wenqi Deng, Shuiguang Machine Learning Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we incorporate retrieval history into the state at each retrieval step to mitigate state aliasing. Extensive experiments across diverse RAG pipelines, datasets, and retriever scales demonstrate consistent improvements of our approach in RAG performance.
title	Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG
topic	Machine Learning
url	https://arxiv.org/abs/2602.03645

Similar Items