MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Hu, Minyang, Yang, Bo, Zhou, Zhinuo, Liang, Jiachen, Jiahao, Guo, Yin, Yiyang, Han, Xiongwei
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2605.29893
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866918529479999488
author	Hu, Minyang Yang, Bo Zhou, Zhinuo Liang, Jiachen Jiahao, Guo Yin, Yiyang Han, Xiongwei
author_facet	Hu, Minyang Yang, Bo Zhou, Zhinuo Liang, Jiachen Jiahao, Guo Yin, Yiyang Han, Xiongwei
contents	LLM-based agents have demonstrated strong capabilities in solving complex tasks through multi-step reasoning and tool use. However, existing evaluation protocols primarily focus on task success, overlooking a critical aspect of agent behavior: execution efficiency. In practice, agent trajectories often contain redundant steps that consume substantial resources while contributing little to task completion. In this work, we propose and formulate a new research area: \textbf{redundant step detection} for agent trajectories. To support this initiative, we introduce \textbf{RedundancyBench}, a new benchmark that contains diverse tasks with carefully annotated trajectories, where each step is labeled according to its contribution to task completion. Using RedundancyBench, we develop and evaluate 3 representative methods to answer whether a step within trajectory is redundant or necessary. Our results show that even the best-performing method achieves only 24.88\% score in detecting redundant steps, while some methods perform worse than random guessing. These results highlight the task's complexity and the need for further research in this area. \footnote{Code and dataset in this paper are both available in \href{https://anonymous.4open.science/r/RedundancyBench}{https://anonymous.4open.science/r/RedundancyBench}.}
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_29893
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Redundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent Trajectories Hu, Minyang Yang, Bo Zhou, Zhinuo Liang, Jiachen Jiahao, Guo Yin, Yiyang Han, Xiongwei Artificial Intelligence LLM-based agents have demonstrated strong capabilities in solving complex tasks through multi-step reasoning and tool use. However, existing evaluation protocols primarily focus on task success, overlooking a critical aspect of agent behavior: execution efficiency. In practice, agent trajectories often contain redundant steps that consume substantial resources while contributing little to task completion. In this work, we propose and formulate a new research area: \textbf{redundant step detection} for agent trajectories. To support this initiative, we introduce \textbf{RedundancyBench}, a new benchmark that contains diverse tasks with carefully annotated trajectories, where each step is labeled according to its contribution to task completion. Using RedundancyBench, we develop and evaluate 3 representative methods to answer whether a step within trajectory is redundant or necessary. Our results show that even the best-performing method achieves only 24.88\% score in detecting redundant steps, while some methods perform worse than random guessing. These results highlight the task's complexity and the need for further research in this area. \footnote{Code and dataset in this paper are both available in \href{https://anonymous.4open.science/r/RedundancyBench}{https://anonymous.4open.science/r/RedundancyBench}.}
title	Redundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent Trajectories
topic	Artificial Intelligence
url	https://arxiv.org/abs/2605.29893

Documenti analoghi