Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Shiyu, Tang, Yang, Wang, Yifan, Li, Peiming, Chen, Xi
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2510.00568
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913100118097920
author	Li, Shiyu Tang, Yang Wang, Yifan Li, Peiming Chen, Xi
author_facet	Li, Shiyu Tang, Yang Wang, Yifan Li, Peiming Chen, Xi
contents	Search agents powered by Large Language Models (LLMs) have demonstrated significant potential in tackling knowledge-intensive tasks. Reinforcement learning (RL) has emerged as a powerful paradigm for training these agents to perform complex, multi-step reasoning. However, prior RL-based methods often rely on sparse or rule-based rewards, which can lead agents to commit to suboptimal or erroneous reasoning paths without the ability to recover. To address these limitations, we propose ReSeek, a novel self-correcting framework for training search agents. Our framework introduces a self-correction mechanism that empowers the agent to dynamically identify and recover from erroneous search paths during an episode. By invoking a special JUDGE action, the agent can judge the information and re-plan its search strategy. To guide this process, we design a dense, instructive process reward function, which decomposes into a correctness reward for retrieving factual information and a utility reward for finding information genuinely useful for the query. Furthermore, to mitigate the risk of data contamination in existing datasets, we introduce FictionalHot, a new and challenging benchmark with recently curated questions requiring complex reasoning. Being intuitively reasonable and practically simple, extensive experiments show that agents trained with ReSeek significantly outperform SOTA baselines in task success rate and path faithfulness.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_00568
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards Li, Shiyu Tang, Yang Wang, Yifan Li, Peiming Chen, Xi Computation and Language Search agents powered by Large Language Models (LLMs) have demonstrated significant potential in tackling knowledge-intensive tasks. Reinforcement learning (RL) has emerged as a powerful paradigm for training these agents to perform complex, multi-step reasoning. However, prior RL-based methods often rely on sparse or rule-based rewards, which can lead agents to commit to suboptimal or erroneous reasoning paths without the ability to recover. To address these limitations, we propose ReSeek, a novel self-correcting framework for training search agents. Our framework introduces a self-correction mechanism that empowers the agent to dynamically identify and recover from erroneous search paths during an episode. By invoking a special JUDGE action, the agent can judge the information and re-plan its search strategy. To guide this process, we design a dense, instructive process reward function, which decomposes into a correctness reward for retrieving factual information and a utility reward for finding information genuinely useful for the query. Furthermore, to mitigate the risk of data contamination in existing datasets, we introduce FictionalHot, a new and challenging benchmark with recently curated questions requiring complex reasoning. Being intuitively reasonable and practically simple, extensive experiments show that agents trained with ReSeek significantly outperform SOTA baselines in task success rate and path faithfulness.
title	ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards
topic	Computation and Language
url	https://arxiv.org/abs/2510.00568

Similar Items