Saved in:
Bibliographic Details
Main Authors: Zhang, Yicheng, Qin, Zhen, Wu, Zhaomin, Zhang, Wenqi, Deng, Shuiguang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.03645
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914304425459712
author Zhang, Yicheng
Qin, Zhen
Wu, Zhaomin
Zhang, Wenqi
Deng, Shuiguang
author_facet Zhang, Yicheng
Qin, Zhen
Wu, Zhaomin
Zhang, Wenqi
Deng, Shuiguang
contents Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we incorporate retrieval history into the state at each retrieval step to mitigate state aliasing. Extensive experiments across diverse RAG pipelines, datasets, and retriever scales demonstrate consistent improvements of our approach in RAG performance.
format Preprint
id arxiv_https___arxiv_org_abs_2602_03645
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG
Zhang, Yicheng
Qin, Zhen
Wu, Zhaomin
Zhang, Wenqi
Deng, Shuiguang
Machine Learning
Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we incorporate retrieval history into the state at each retrieval step to mitigate state aliasing. Extensive experiments across diverse RAG pipelines, datasets, and retriever scales demonstrate consistent improvements of our approach in RAG performance.
title Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG
topic Machine Learning
url https://arxiv.org/abs/2602.03645