Saved in:
Bibliographic Details
Main Authors: Chen, Guhong, Sun, Chenghao, Fu, Cheng, Wang, Qiyao, Huang, Zhihong, Wei, Chaopeng, Chen, Guangxu, Fang, Feiteng, Argha, Ahmadreza, Zhao, Bing, Xu, Xander, Han, Qi, Alinejad-Rokny, Hamid, Qu, Qiang, Li, Binhua, Ni, Shiwen, Yang, Min, Wei, Hu, Li, Yongbin
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.03219
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917258970791936
author Chen, Guhong
Sun, Chenghao
Fu, Cheng
Wang, Qiyao
Huang, Zhihong
Wei, Chaopeng
Chen, Guangxu
Fang, Feiteng
Argha, Ahmadreza
Zhao, Bing
Xu, Xander
Han, Qi
Alinejad-Rokny, Hamid
Qu, Qiang
Li, Binhua
Ni, Shiwen
Yang, Min
Wei, Hu
Li, Yongbin
author_facet Chen, Guhong
Sun, Chenghao
Fu, Cheng
Wang, Qiyao
Huang, Zhihong
Wei, Chaopeng
Chen, Guangxu
Fang, Feiteng
Argha, Ahmadreza
Zhao, Bing
Xu, Xander
Han, Qi
Alinejad-Rokny, Hamid
Qu, Qiang
Li, Binhua
Ni, Shiwen
Yang, Min
Wei, Hu
Li, Yongbin
contents As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling. Moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, tau^2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. We plan to release the full codebase and the synthesized dataset (including 30,000+ tool clusters) upon publication.
format Preprint
id arxiv_https___arxiv_org_abs_2602_03219
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Beyond Quantity: Trajectory Diversity Scaling for Code Agents
Chen, Guhong
Sun, Chenghao
Fu, Cheng
Wang, Qiyao
Huang, Zhihong
Wei, Chaopeng
Chen, Guangxu
Fang, Feiteng
Argha, Ahmadreza
Zhao, Bing
Xu, Xander
Han, Qi
Alinejad-Rokny, Hamid
Qu, Qiang
Li, Binhua
Ni, Shiwen
Yang, Min
Wei, Hu
Li, Yongbin
Artificial Intelligence
As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling. Moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, tau^2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. We plan to release the full codebase and the synthesized dataset (including 30,000+ tool clusters) upon publication.
title Beyond Quantity: Trajectory Diversity Scaling for Code Agents
topic Artificial Intelligence
url https://arxiv.org/abs/2602.03219