Saved in:
Bibliographic Details
Main Authors: Liu, Jiaxi, Jiang, Yanzuo, Zhang, Guibin, Zhang, Zihan, Chang, Heng, Yin, Zhenfei, Ren, Qibing, Yan, Junchi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.07839
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.