Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Yu, Cai, Guangfeng, Yang, Shengtian, Luo, Han, Han, Shuo, He, Xu, Li, Dong, Feng, Lei
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.13691
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912904397193216
author	Li, Yu Cai, Guangfeng Yang, Shengtian Luo, Han Han, Shuo He, Xu Li, Dong Feng, Lei
author_facet	Li, Yu Cai, Guangfeng Yang, Shengtian Luo, Han Han, Shuo He, Xu Li, Dong Feng, Lei
contents	Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical trajectories and then uses the learned pheromone to guide policy optimization. This learned pheromone provides explicit and reusable guidance that steers policy optimization toward historically successful tool transitions, thereby improving long-horizon tool planning. Comprehensive experimental results demonstrate the effectiveness of our proposed PhGPO.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_13691
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning Li, Yu Cai, Guangfeng Yang, Shengtian Luo, Han Han, Shuo He, Xu Li, Dong Feng, Lei Artificial Intelligence Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical trajectories and then uses the learned pheromone to guide policy optimization. This learned pheromone provides explicit and reusable guidance that steers policy optimization toward historically successful tool transitions, thereby improving long-horizon tool planning. Comprehensive experimental results demonstrate the effectiveness of our proposed PhGPO.
title	PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.13691

Similar Items