Saved in:
| Main Authors: | Cheng, Zihao, Wang, Hongru, Liu, Zeming, Wang, Xinyi, Zhu, Xiangrong, Guo, Yuhang, Lin, Wei, Pan, Jeff Z., Wang, Yunhong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.20876 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation
by: Cheng, Zihao, et al.
Published: (2026)
by: Cheng, Zihao, et al.
Published: (2026)
ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models
by: Cheng, Zihao, et al.
Published: (2025)
by: Cheng, Zihao, et al.
Published: (2025)
Endless Terminals: Scaling RL Environments for Terminal Agents
by: Gandhi, Kanishk, et al.
Published: (2026)
by: Gandhi, Kanishk, et al.
Published: (2026)
LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents
by: Peng, Xiaoxuan, et al.
Published: (2026)
by: Peng, Xiaoxuan, et al.
Published: (2026)
TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments
by: Lu, Yuheng, et al.
Published: (2025)
by: Lu, Yuheng, et al.
Published: (2025)
Defenses & Enablers For Skill Injection Attacks on Terminal Based Agents
by: Fujinuma, Yoshinari, et al.
Published: (2026)
by: Fujinuma, Yoshinari, et al.
Published: (2026)
Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments
by: Wu, Siwei, et al.
Published: (2026)
by: Wu, Siwei, et al.
Published: (2026)
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
by: Wang, Hongru, et al.
Published: (2025)
by: Wang, Hongru, et al.
Published: (2025)
RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models
by: Liu, Jingjing, et al.
Published: (2025)
by: Liu, Jingjing, et al.
Published: (2025)
DocOS: Towards Proactive Document-Guided Actions in GUI Agents
by: Liu, Jingjing, et al.
Published: (2026)
by: Liu, Jingjing, et al.
Published: (2026)
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
by: Ren, Jincheng, et al.
Published: (2026)
by: Ren, Jincheng, et al.
Published: (2026)
Terminal Agents Suffice for Enterprise Automation
by: Bechard, Patrice, et al.
Published: (2026)
by: Bechard, Patrice, et al.
Published: (2026)
The Scaling Laws of Skills in LLM Agent Systems
by: Chen, Charles, et al.
Published: (2026)
by: Chen, Charles, et al.
Published: (2026)
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
by: Dong, Guanting, et al.
Published: (2026)
by: Dong, Guanting, et al.
Published: (2026)
Exploring In-Image Machine Translation with Real-World Background
by: Tian, Yanzhi, et al.
Published: (2025)
by: Tian, Yanzhi, et al.
Published: (2025)
DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges
by: Wang, Hongru, et al.
Published: (2025)
by: Wang, Hongru, et al.
Published: (2025)
PhoneWorld: Scaling Phone-Use Agent Environments
by: Tang, Zhengyang, et al.
Published: (2026)
by: Tang, Zhengyang, et al.
Published: (2026)
Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards
by: Da, Jeff, et al.
Published: (2025)
by: Da, Jeff, et al.
Published: (2025)
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
by: Jin, Chang, et al.
Published: (2026)
by: Jin, Chang, et al.
Published: (2026)
Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges
by: Shi, Xiaoming, et al.
Published: (2024)
by: Shi, Xiaoming, et al.
Published: (2024)
Skill-as-Pseudocode: Refactoring Skill Libraries to Pseudocode for LLM Agents
by: Li, Xinze, et al.
Published: (2026)
by: Li, Xinze, et al.
Published: (2026)
On Data Engineering for Scaling LLM Terminal Capabilities
by: Pi, Renjie, et al.
Published: (2026)
by: Pi, Renjie, et al.
Published: (2026)
TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine
by: Cheng, Zihao, et al.
Published: (2025)
by: Cheng, Zihao, et al.
Published: (2025)
TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks
by: Chu, Zhaoyang, et al.
Published: (2026)
by: Chu, Zhaoyang, et al.
Published: (2026)
HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices
by: Li, Silin, et al.
Published: (2025)
by: Li, Silin, et al.
Published: (2025)
Utilizing and Calibrating Hindsight Process Rewards via Reinforcement with Mutual Information Self-Evaluation
by: Yao, Jiashu, et al.
Published: (2026)
by: Yao, Jiashu, et al.
Published: (2026)
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA
by: Huang, Wenyu, et al.
Published: (2024)
by: Huang, Wenyu, et al.
Published: (2024)
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
by: Zhu, Kaijie, et al.
Published: (2026)
by: Zhu, Kaijie, et al.
Published: (2026)
SEAL: Synergistic Co-Evolution of Agents and Learning Environments
by: Hu, Yihao, et al.
Published: (2026)
by: Hu, Yihao, et al.
Published: (2026)
Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents
by: Wang, Jingxing, et al.
Published: (2026)
by: Wang, Jingxing, et al.
Published: (2026)
SkillMAS: Skill Co-Evolution with LLM-based Multi-Agent System
by: Pan, Shuai, et al.
Published: (2026)
by: Pan, Shuai, et al.
Published: (2026)
Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR
by: Tang, Yihong, et al.
Published: (2026)
by: Tang, Yihong, et al.
Published: (2026)
SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs
by: Li, Xiaoyuan, et al.
Published: (2026)
by: Li, Xiaoyuan, et al.
Published: (2026)
Memento-Skills: Let Agents Design Agents
by: Zhou, Huichi, et al.
Published: (2026)
by: Zhou, Huichi, et al.
Published: (2026)
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
by: Xu, Minrui, et al.
Published: (2026)
by: Xu, Minrui, et al.
Published: (2026)
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges
by: Qian, Cheng, et al.
Published: (2025)
by: Qian, Cheng, et al.
Published: (2025)
EndPrompt: Efficient Long-Context Extension via Terminal Anchoring
by: Tian, Han, et al.
Published: (2026)
by: Tian, Han, et al.
Published: (2026)
KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus
by: Shi, Xiaoming, et al.
Published: (2025)
by: Shi, Xiaoming, et al.
Published: (2025)
Deterministic Reversible Data Augmentation for Neural Machine Translation
by: Yao, Jiashu, et al.
Published: (2024)
by: Yao, Jiashu, et al.
Published: (2024)
Similar Items
-
Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation
by: Cheng, Zihao, et al.
Published: (2026) -
ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models
by: Cheng, Zihao, et al.
Published: (2025) -
Endless Terminals: Scaling RL Environments for Terminal Agents
by: Gandhi, Kanishk, et al.
Published: (2026) -
LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents
by: Peng, Xiaoxuan, et al.
Published: (2026) -
TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments
by: Lu, Yuheng, et al.
Published: (2025)