Saved in:
Bibliographic Details
Main Authors: Hu, Senkang, Dai, Yong, Zhao, Yuzhi, Tao, Yihang, Guo, Yu, Fang, Zhengru, Kwong, Sam Tak Wu, Fang, Yuguang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.00845
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908822195404800
author Hu, Senkang
Dai, Yong
Zhao, Yuzhi
Tao, Yihang
Guo, Yu
Fang, Zhengru
Kwong, Sam Tak Wu
Fang, Yuguang
author_facet Hu, Senkang
Dai, Yong
Zhao, Yuzhi
Tao, Yihang
Guo, Yu
Fang, Zhengru
Kwong, Sam Tak Wu
Fang, Yuguang
contents Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner
format Preprint
id arxiv_https___arxiv_org_abs_2602_00845
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward
Hu, Senkang
Dai, Yong
Zhao, Yuzhi
Tao, Yihang
Guo, Yu
Fang, Zhengru
Kwong, Sam Tak Wu
Fang, Yuguang
Artificial Intelligence
Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner
title Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward
topic Artificial Intelligence
url https://arxiv.org/abs/2602.00845