Saved in:
Bibliographic Details
Main Authors: Tian, Xiaoyu, Wang, Haotian, Chen, Shuaiting, Zhou, Hao, Yu, Kaichi, Zhang, Yudian, Ouyang, Jade, Yin, Junxi, Chen, Jiong, Guo, Baoyan, Zhang, Lei, Tao, Junjie, Song, Yuansheng, Cui, Ming, Liu, Chengwei
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.21558
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Large language models (LLMs) are increasingly used as tool-augmented agents for multi-step decision making, yet training robust tool-using agents remains challenging. Existing methods still require manual intervention, depend on non-verifiable simulated environments, rely exclusively on either supervised fine-tuning (SFT) or reinforcement learning (RL), and struggle with stable long-horizon, multi-turn learning. To address these challenges, we introduce ASTRA, a fully automated end-to-end framework for training tool-augmented language model agents via scalable data synthesis and verifiable reinforcement learning. ASTRA integrates two complementary components. First, a pipeline that leverages the static topology of tool-call graphs synthesizes diverse, structurally grounded trajectories, instilling broad and transferable tool-use competence. Second, an environment synthesis framework that captures the rich, compositional topology of human semantic reasoning converts decomposed question-answer traces into independent, code-executable, and rule-verifiable environments, enabling deterministic multi-turn RL. Based on this method, we develop a unified training methodology that integrates SFT with online RL using trajectory-level rewards to balance task completion and interaction efficiency. Experiments on multiple agentic tool-use benchmarks demonstrate that ASTRA-trained models achieve state-of-the-art performance at comparable scales, approaching closed-source systems while preserving core reasoning ability. We release the full pipelines, environments, and trained models at https://github.com/LianjiaTech/astra.