Saved in:
Bibliographic Details
Main Authors: Li, Yu, Cai, Guangfeng, Yang, Shengtian, Luo, Han, Han, Shuo, He, Xu, Li, Dong, Feng, Lei
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.13691
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912904397193216
author Li, Yu
Cai, Guangfeng
Yang, Shengtian
Luo, Han
Han, Shuo
He, Xu
Li, Dong
Feng, Lei
author_facet Li, Yu
Cai, Guangfeng
Yang, Shengtian
Luo, Han
Han, Shuo
He, Xu
Li, Dong
Feng, Lei
contents Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical trajectories and then uses the learned pheromone to guide policy optimization. This learned pheromone provides explicit and reusable guidance that steers policy optimization toward historically successful tool transitions, thereby improving long-horizon tool planning. Comprehensive experimental results demonstrate the effectiveness of our proposed PhGPO.
format Preprint
id arxiv_https___arxiv_org_abs_2602_13691
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning
Li, Yu
Cai, Guangfeng
Yang, Shengtian
Luo, Han
Han, Shuo
He, Xu
Li, Dong
Feng, Lei
Artificial Intelligence
Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical trajectories and then uses the learned pheromone to guide policy optimization. This learned pheromone provides explicit and reusable guidance that steers policy optimization toward historically successful tool transitions, thereby improving long-horizon tool planning. Comprehensive experimental results demonstrate the effectiveness of our proposed PhGPO.
title PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning
topic Artificial Intelligence
url https://arxiv.org/abs/2602.13691