Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Zijian, Ma, Xueguang, Zhuang, Shengyao, Lin, Jimmy, Asai, Akari, Zhong, Victor
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.04384
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917325617233920
author	Chen, Zijian Ma, Xueguang Zhuang, Shengyao Lin, Jimmy Asai, Akari Zhong, Victor
author_facet	Chen, Zijian Ma, Xueguang Zhuang, Shengyao Lin, Jimmy Asai, Akari Zhong, Victor
contents	Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems. Unlike human users who issue and refine queries without documenting their intermediate thought processes, Deep Research agents generate explicit natural language reasoning before each search call, revealing rich intent and contextual information that existing retrievers entirely ignore. To exploit this overlooked signal, we introduce: (1) Reasoning-Aware Retrieval, a retrieval paradigm that jointly embeds the agent's reasoning trace alongside its query; and (2) DR-Synth, a data synthesis method that generates Deep Research retriever training data from standard QA datasets. We demonstrate that both components are independently effective, and their combination yields a trained embedding model, AgentIR-4B, with substantial gains. On the challenging BrowseComp-Plus benchmark, AgentIR-4B achieves 68\% accuracy with the open-weight agent Tongyi-DeepResearch, compared to 50\% with conventional embedding models twice its size, and 37\% with BM25. Code and data are available at: https://texttron.github.io/AgentIR/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_04384
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	AgentIR: Reasoning-Aware Retrieval for Deep Research Agents Chen, Zijian Ma, Xueguang Zhuang, Shengyao Lin, Jimmy Asai, Akari Zhong, Victor Computation and Language Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems. Unlike human users who issue and refine queries without documenting their intermediate thought processes, Deep Research agents generate explicit natural language reasoning before each search call, revealing rich intent and contextual information that existing retrievers entirely ignore. To exploit this overlooked signal, we introduce: (1) Reasoning-Aware Retrieval, a retrieval paradigm that jointly embeds the agent's reasoning trace alongside its query; and (2) DR-Synth, a data synthesis method that generates Deep Research retriever training data from standard QA datasets. We demonstrate that both components are independently effective, and their combination yields a trained embedding model, AgentIR-4B, with substantial gains. On the challenging BrowseComp-Plus benchmark, AgentIR-4B achieves 68\% accuracy with the open-weight agent Tongyi-DeepResearch, compared to 50\% with conventional embedding models twice its size, and 37\% with BM25. Code and data are available at: https://texttron.github.io/AgentIR/.
title	AgentIR: Reasoning-Aware Retrieval for Deep Research Agents
topic	Computation and Language
url	https://arxiv.org/abs/2603.04384

Similar Items