Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhai, Zhiyuan, Yan, Wenjing, Shao, Xiaodan, Wang, Xin
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2604.14877
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913038215413760
author	Zhai, Zhiyuan Yan, Wenjing Shao, Xiaodan Wang, Xin
author_facet	Zhai, Zhiyuan Yan, Wenjing Shao, Xiaodan Wang, Xin
contents	Does reinforcement learning genuinely expand what LLM agents can do, or merely make them more reliable? For static reasoning, recent work answers the second: base and RL pass@k curves converge at large k. We ask whether this holds for agentic tool use, where T rounds of interaction enable compositional strategies that re-sampling cannot recover. We introduce PASS@(k,T), a two-dimensional metric that jointly varies sampling budget k and interaction depth T, separating capability expansion from efficiency improvement. Our main finding is that, contrary to the static-reasoning result, tool-use RL genuinely enlarges the capability boundary: the RL agent's pass-curve pulls above the base model's and the gap widens at large k rather than converging. The expansion is specific to compositional, sequential information gathering; on simpler tasks RL behaves as prior work predicts. Under matched training data, supervised fine-tuning regresses the boundary on the same compositional tasks, isolating self-directed exploration as the causal factor. Mechanism analysis shows RL reweights the base strategy distribution toward the subset whose downstream reasoning more often yields a correct answer, with the improvement concentrated on how the agent integrates retrieved information. These results reconcile optimistic and pessimistic readings of RL for LLMs: both are correct, on different task types.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_14877
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis Zhai, Zhiyuan Yan, Wenjing Shao, Xiaodan Wang, Xin Machine Learning Does reinforcement learning genuinely expand what LLM agents can do, or merely make them more reliable? For static reasoning, recent work answers the second: base and RL pass@k curves converge at large k. We ask whether this holds for agentic tool use, where T rounds of interaction enable compositional strategies that re-sampling cannot recover. We introduce PASS@(k,T), a two-dimensional metric that jointly varies sampling budget k and interaction depth T, separating capability expansion from efficiency improvement. Our main finding is that, contrary to the static-reasoning result, tool-use RL genuinely enlarges the capability boundary: the RL agent's pass-curve pulls above the base model's and the gap widens at large k rather than converging. The expansion is specific to compositional, sequential information gathering; on simpler tasks RL behaves as prior work predicts. Under matched training data, supervised fine-tuning regresses the boundary on the same compositional tasks, isolating self-directed exploration as the causal factor. Mechanism analysis shows RL reweights the base strategy distribution toward the subset whose downstream reasoning more often yields a correct answer, with the improvement concentrated on how the agent integrates retrieved information. These results reconcile optimistic and pessimistic readings of RL for LLMs: both are correct, on different task types.
title	Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis
topic	Machine Learning
url	https://arxiv.org/abs/2604.14877

Similar Items