Saved in:
| Main Authors: | Xi, Zhiheng, Ding, Yiwen, Chen, Wenxiang, Hong, Boyang, Guo, Honglin, Wang, Junzhe, Yang, Dingwen, Liao, Chenyang, Guo, Xin, He, Wei, Gao, Songyang, Chen, Lu, Zheng, Rui, Zou, Yicheng, Gui, Tao, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing, Wu, Zuxuan, Jiang, Yu-Gang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.04151 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Better Process Supervision with Bi-directional Rewarding Signals
by: Chen, Wenxiang, et al.
Published: (2025)
by: Chen, Wenxiang, et al.
Published: (2025)
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
by: Shen, Yujiong, et al.
Published: (2026)
by: Shen, Yujiong, et al.
Published: (2026)
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
by: Wang, Zirui, et al.
Published: (2026)
by: Wang, Zirui, et al.
Published: (2026)
MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
by: Lin, Jiahang, et al.
Published: (2026)
by: Lin, Jiahang, et al.
Published: (2026)
AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts
by: Fang, Shicheng, et al.
Published: (2026)
by: Fang, Shicheng, et al.
Published: (2026)
CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization
by: Chen, Yitong, et al.
Published: (2026)
by: Chen, Yitong, et al.
Published: (2026)
OceanGym: A Benchmark Environment for Underwater Embodied Agents
by: Xue, Yida, et al.
Published: (2025)
by: Xue, Yida, et al.
Published: (2025)
Agent Alignment in Evolving Social Norms
by: Li, Shimin, et al.
Published: (2024)
by: Li, Shimin, et al.
Published: (2024)
Pre-Trained Policy Discriminators are General Reward Models
by: Dou, Shihan, et al.
Published: (2025)
by: Dou, Shihan, et al.
Published: (2025)
CritiQ: Mining Data Quality Criteria from Human Preferences
by: Guo, Honglin, et al.
Published: (2025)
by: Guo, Honglin, et al.
Published: (2025)
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
by: Lv, Kai, et al.
Published: (2025)
by: Lv, Kai, et al.
Published: (2025)
UserBench: An Interactive Gym Environment for User-Centric Agents
by: Qian, Cheng, et al.
Published: (2025)
by: Qian, Cheng, et al.
Published: (2025)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
NARRA-Gym for Evaluating Interactive Narrative Agents
by: Huang, Yue, et al.
Published: (2026)
by: Huang, Yue, et al.
Published: (2026)
Gym-Anything: Turn any Software into an Agent Environment
by: Aggarwal, Pranjal, et al.
Published: (2026)
by: Aggarwal, Pranjal, et al.
Published: (2026)
AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems
by: Wang, Weiyi, et al.
Published: (2026)
by: Wang, Weiyi, et al.
Published: (2026)
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
by: Wang, Junzhe, et al.
Published: (2026)
by: Wang, Junzhe, et al.
Published: (2026)
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
by: Xi, Zhiheng, et al.
Published: (2023)
by: Xi, Zhiheng, et al.
Published: (2023)
AgentV-RL: Scaling Reward Modeling with Agentic Verifier
by: Zhang, Jiazheng, et al.
Published: (2026)
by: Zhang, Jiazheng, et al.
Published: (2026)
Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing
by: Guo, Xin, et al.
Published: (2025)
by: Guo, Xin, et al.
Published: (2025)
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
by: Zhao, Jun, et al.
Published: (2024)
by: Zhao, Jun, et al.
Published: (2024)
AEL: Agent Evolving Learning for Open-Ended Environments
by: Xu, Wujiang, et al.
Published: (2026)
by: Xu, Wujiang, et al.
Published: (2026)
NegotiationGym: Self-Optimizing Agents in a Multi-Agent Social Simulation Environment
by: Mangla, Shashank, et al.
Published: (2025)
by: Mangla, Shashank, et al.
Published: (2025)
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
by: Xia, Han, et al.
Published: (2024)
by: Xia, Han, et al.
Published: (2024)
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders
by: He, Zhengfu, et al.
Published: (2024)
by: He, Zhengfu, et al.
Published: (2024)
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
by: Zhou, Xin, et al.
Published: (2025)
by: Zhou, Xin, et al.
Published: (2025)
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
by: Lin, Jiahang, et al.
Published: (2026)
by: Lin, Jiahang, et al.
Published: (2026)
EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents
by: Ma, Sai, et al.
Published: (2026)
by: Ma, Sai, et al.
Published: (2026)
AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents
by: Wang, Xiaoxing, et al.
Published: (2026)
by: Wang, Xiaoxing, et al.
Published: (2026)
VehicleWorld: A Highly Integrated Multi-Device Environment for Intelligent Vehicle Interaction
by: Yang, Jie, et al.
Published: (2025)
by: Yang, Jie, et al.
Published: (2025)
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
by: Wang, Siyuan, et al.
Published: (2024)
by: Wang, Siyuan, et al.
Published: (2024)
When Agents Evolve, Institutions Follow
by: Fei, Chao, et al.
Published: (2026)
by: Fei, Chao, et al.
Published: (2026)
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
by: Wang, Bowen, et al.
Published: (2026)
by: Wang, Bowen, et al.
Published: (2026)
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
by: Khan, Zaid, et al.
Published: (2024)
by: Khan, Zaid, et al.
Published: (2024)
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
by: Bai, Hao, et al.
Published: (2026)
by: Bai, Hao, et al.
Published: (2026)
Can RL Improve Generalization of LLM Agents? An Empirical Study
by: Xi, Zhiheng, et al.
Published: (2026)
by: Xi, Zhiheng, et al.
Published: (2026)
Similar Items
-
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025) -
Better Process Supervision with Bi-directional Rewarding Signals
by: Chen, Wenxiang, et al.
Published: (2025) -
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress
by: Xi, Zhiheng, et al.
Published: (2025) -
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
by: Shen, Yujiong, et al.
Published: (2026) -
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2024)