Guardado en:
Detalles Bibliográficos
Autores principales: Zhang, Kai, Chen, Xiangchao, Liu, Bo, Xue, Tianci, Liao, Zeyi, Liu, Zhihan, Wang, Xiyao, Ning, Yuting, Chen, Zhaorun, Fu, Xiaohan, Xie, Jian, Sun, Yuxuan, Gou, Boyu, Qi, Qi, Meng, Zihang, Yang, Jianwei, Zhang, Ning, Li, Xian, Shah, Ashish, Huynh, Dat, Li, Hengduo, Yang, Zi, Cao, Sara, Jang, Lawrence, Zhou, Shuyan, Zhu, Jiacheng, Sun, Huan, Weston, Jason, Su, Yu, Wu, Yifan
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2510.08558
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866918520498946048
author Zhang, Kai
Chen, Xiangchao
Liu, Bo
Xue, Tianci
Liao, Zeyi
Liu, Zhihan
Wang, Xiyao
Ning, Yuting
Chen, Zhaorun
Fu, Xiaohan
Xie, Jian
Sun, Yuxuan
Gou, Boyu
Qi, Qi
Meng, Zihang
Yang, Jianwei
Zhang, Ning
Li, Xian
Shah, Ashish
Huynh, Dat
Li, Hengduo
Yang, Zi
Cao, Sara
Jang, Lawrence
Zhou, Shuyan
Zhu, Jiacheng
Sun, Huan
Weston, Jason
Su, Yu
Wu, Yifan
author_facet Zhang, Kai
Chen, Xiangchao
Liu, Bo
Xue, Tianci
Liao, Zeyi
Liu, Zhihan
Wang, Xiyao
Ning, Yuting
Chen, Zhaorun
Fu, Xiaohan
Xie, Jian
Sun, Yuxuan
Gou, Boyu
Qi, Qi
Meng, Zihang
Yang, Jianwei
Zhang, Ning
Li, Xian
Shah, Ashish
Huynh, Dat
Li, Hengduo
Yang, Zi
Cao, Sara
Jang, Lawrence
Zhou, Shuyan
Zhu, Jiacheng
Sun, Huan
Weston, Jason
Su, Yu
Wu, Yifan
contents A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios, and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm, we study two strategies of using such data: (1) implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. Evaluation across eight diverse environments and multiple model families shows that our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, making it a practical bridge between imitation learning and fully experience-driven agents.
format Preprint
id arxiv_https___arxiv_org_abs_2510_08558
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Agent Learning via Early Experience
Zhang, Kai
Chen, Xiangchao
Liu, Bo
Xue, Tianci
Liao, Zeyi
Liu, Zhihan
Wang, Xiyao
Ning, Yuting
Chen, Zhaorun
Fu, Xiaohan
Xie, Jian
Sun, Yuxuan
Gou, Boyu
Qi, Qi
Meng, Zihang
Yang, Jianwei
Zhang, Ning
Li, Xian
Shah, Ashish
Huynh, Dat
Li, Hengduo
Yang, Zi
Cao, Sara
Jang, Lawrence
Zhou, Shuyan
Zhu, Jiacheng
Sun, Huan
Weston, Jason
Su, Yu
Wu, Yifan
Artificial Intelligence
Computation and Language
Information Retrieval
Machine Learning
A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios, and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm, we study two strategies of using such data: (1) implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. Evaluation across eight diverse environments and multiple model families shows that our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, making it a practical bridge between imitation learning and fully experience-driven agents.
title Agent Learning via Early Experience
topic Artificial Intelligence
Computation and Language
Information Retrieval
Machine Learning
url https://arxiv.org/abs/2510.08558