Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hu, Senkang, Dai, Yong, Zhao, Yuzhi, Tao, Yihang, Guo, Yu, Fang, Zhengru, Kwong, Sam Tak Wu, Fang, Yuguang
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.00845
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908822195404800
author	Hu, Senkang Dai, Yong Zhao, Yuzhi Tao, Yihang Guo, Yu Fang, Zhengru Kwong, Sam Tak Wu Fang, Yuguang
author_facet	Hu, Senkang Dai, Yong Zhao, Yuzhi Tao, Yihang Guo, Yu Fang, Zhengru Kwong, Sam Tak Wu Fang, Yuguang
contents	Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_00845
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward Hu, Senkang Dai, Yong Zhao, Yuzhi Tao, Yihang Guo, Yu Fang, Zhengru Kwong, Sam Tak Wu Fang, Yuguang Artificial Intelligence Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner
title	Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.00845

Similar Items