Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Zhong, Shuzhang, Lu, Baotong, Chen, Qi, Liu, Chuanjie, Yang, Fan, Li, Meng
Format:	Preprint
Publié:	2026
Sujets:	Machine Learning
Accès en ligne:	https://arxiv.org/abs/2603.07416
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866911496141799424
author	Zhong, Shuzhang Lu, Baotong Chen, Qi Liu, Chuanjie Yang, Fan Li, Meng
author_facet	Zhong, Shuzhang Lu, Baotong Chen, Qi Liu, Chuanjie Yang, Fan Li, Meng
contents	Large language model-based deep research agents have been increasingly popular for addressing long-horizon information-seeking tasks, but they often incur high end-to-end latency due to extensive reasoning and frequent tool use. Speculation frameworks aim to reduce latency by overlapping action execution with reasoning; however, existing approaches typically rely on uniform speculation strategies and strict action matching, which limits inference speedups and robustness. In this work, we revisit the speculate-verify paradigm for deep research agents through the lens of action heterogeneity. We show that \textit{Search} and \textit{Visit} actions exhibit fundamentally different reasoning and model capacity requirements: entropy-based analysis reveals that Search decisions have higher uncertainty and benefit significantly from explicit reasoning, whereas Visit decisions have lower entropy and depend primarily on model capacity. Motivated by this dual-process characteristic, we propose DualSpec, a heterogeneous speculation framework equipped with a lightweight, confidence-based semantic verifier. Experiments across multiple models and benchmarks demonstrate that DualSpec achieves up to 3.28$\times$ end-to-end speedup while maintaining accuracy comparable to fully reasoning agents.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_07416
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation Zhong, Shuzhang Lu, Baotong Chen, Qi Liu, Chuanjie Yang, Fan Li, Meng Machine Learning Large language model-based deep research agents have been increasingly popular for addressing long-horizon information-seeking tasks, but they often incur high end-to-end latency due to extensive reasoning and frequent tool use. Speculation frameworks aim to reduce latency by overlapping action execution with reasoning; however, existing approaches typically rely on uniform speculation strategies and strict action matching, which limits inference speedups and robustness. In this work, we revisit the speculate-verify paradigm for deep research agents through the lens of action heterogeneity. We show that \textit{Search} and \textit{Visit} actions exhibit fundamentally different reasoning and model capacity requirements: entropy-based analysis reveals that Search decisions have higher uncertainty and benefit significantly from explicit reasoning, whereas Visit decisions have lower entropy and depend primarily on model capacity. Motivated by this dual-process characteristic, we propose DualSpec, a heterogeneous speculation framework equipped with a lightweight, confidence-based semantic verifier. Experiments across multiple models and benchmarks demonstrate that DualSpec achieves up to 3.28$\times$ end-to-end speedup while maintaining accuracy comparable to fully reasoning agents.
title	DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation
topic	Machine Learning
url	https://arxiv.org/abs/2603.07416

Documents similaires