Saved in:
| Main Authors: | Mai, Xinji, Xu, Haotian, Li, Zhong-Zhi, W, Xing, Wang, Weinong, Hu, Jian, Zhang, Yingying, Zhang, Wenqiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.07773 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
by: Zhang, Hao, et al.
Published: (2026)
by: Zhang, Hao, et al.
Published: (2026)
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
by: Zhang, Qiang, et al.
Published: (2026)
by: Zhang, Qiang, et al.
Published: (2026)
AgentV-RL: Scaling Reward Modeling with Agentic Verifier
by: Zhang, Jiazheng, et al.
Published: (2026)
by: Zhang, Jiazheng, et al.
Published: (2026)
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
by: Cao, Shiyi, et al.
Published: (2025)
by: Cao, Shiyi, et al.
Published: (2025)
Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?
by: Chen, Wanyi, et al.
Published: (2026)
by: Chen, Wanyi, et al.
Published: (2026)
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
by: Jiang, Xue, et al.
Published: (2025)
by: Jiang, Xue, et al.
Published: (2025)
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
by: Lai, Hanyu, et al.
Published: (2025)
by: Lai, Hanyu, et al.
Published: (2025)
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving
by: Gao, Songyang, et al.
Published: (2025)
by: Gao, Songyang, et al.
Published: (2025)
Exploring Communication Strategies for Collaborative LLM Agents in Mathematical Problem-Solving
by: Zhang, Liang, et al.
Published: (2025)
by: Zhang, Liang, et al.
Published: (2025)
Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation
by: Ouyang, Shuyin, et al.
Published: (2026)
by: Ouyang, Shuyin, et al.
Published: (2026)
Can RL Improve Generalization of LLM Agents? An Empirical Study
by: Xi, Zhiheng, et al.
Published: (2026)
by: Xi, Zhiheng, et al.
Published: (2026)
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
by: Feng, Zhangying, et al.
Published: (2025)
by: Feng, Zhangying, et al.
Published: (2025)
AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
by: Zhang, Hanchen, et al.
Published: (2025)
by: Zhang, Hanchen, et al.
Published: (2025)
RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?
by: Xu, Haotian, et al.
Published: (2025)
by: Xu, Haotian, et al.
Published: (2025)
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025)
by: Wu, Xiefeng, et al.
Published: (2025)
Proto Successor Measure: Representing the Behavior Space of an RL Agent
by: Agarwal, Siddhant, et al.
Published: (2024)
by: Agarwal, Siddhant, et al.
Published: (2024)
Evolving and Executing Research Plans via Double-Loop Multi-Agent Collaboration
by: Zhang, Zhi, et al.
Published: (2025)
by: Zhang, Zhi, et al.
Published: (2025)
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
by: Li, Weizhen, et al.
Published: (2025)
by: Li, Weizhen, et al.
Published: (2025)
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
by: Chang, Qikai, et al.
Published: (2025)
by: Chang, Qikai, et al.
Published: (2025)
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
by: Dai, Weinan, et al.
Published: (2026)
by: Dai, Weinan, et al.
Published: (2026)
Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
by: Cullen, Carissa, et al.
Published: (2026)
by: Cullen, Carissa, et al.
Published: (2026)
Meta-RL Induces Exploration in Language Agents
by: Jiang, Yulun, et al.
Published: (2025)
by: Jiang, Yulun, et al.
Published: (2025)
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)
by: Qin, Tian, et al.
Published: (2025)
A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
by: Sancaktar, Cansu, et al.
Published: (2026)
by: Sancaktar, Cansu, et al.
Published: (2026)
LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners
by: Zheng, Junhao, et al.
Published: (2025)
by: Zheng, Junhao, et al.
Published: (2025)
ProgAgent:A Continual RL Agent with Progress-Aware Rewards
by: Tan, Jinzhou, et al.
Published: (2026)
by: Tan, Jinzhou, et al.
Published: (2026)
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
by: Zeng, Yifan, et al.
Published: (2026)
by: Zeng, Yifan, et al.
Published: (2026)
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent
by: Li, Wanli, et al.
Published: (2026)
by: Li, Wanli, et al.
Published: (2026)
Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents
by: Fan, Zhiyuan, et al.
Published: (2026)
by: Fan, Zhiyuan, et al.
Published: (2026)
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
by: Phan, Huy Nhat, et al.
Published: (2024)
by: Phan, Huy Nhat, et al.
Published: (2024)
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
by: Lei, Bin, et al.
Published: (2024)
by: Lei, Bin, et al.
Published: (2024)
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph
by: Zhang, Duzhen, et al.
Published: (2025)
by: Zhang, Duzhen, et al.
Published: (2025)
ProSpec RL: Plan Ahead, then Execute
by: Liu, Liangliang, et al.
Published: (2024)
by: Liu, Liangliang, et al.
Published: (2024)
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
by: Gou, Zhibin, et al.
Published: (2023)
by: Gou, Zhibin, et al.
Published: (2023)
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
by: Xia, Peng, et al.
Published: (2025)
by: Xia, Peng, et al.
Published: (2025)
Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL
by: Huang, Jiawei, et al.
Published: (2024)
by: Huang, Jiawei, et al.
Published: (2024)
From Efficient Multimodal Models to World Models: A Survey
by: Mai, Xinji, et al.
Published: (2024)
by: Mai, Xinji, et al.
Published: (2024)
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
by: Li, Pengxiang, et al.
Published: (2025)
by: Li, Pengxiang, et al.
Published: (2025)
GenAI-based Multi-Agent Reinforcement Learning towards Distributed Agent Intelligence: A Generative-RL Agent Perspective
by: Wang, Hang, et al.
Published: (2025)
by: Wang, Hang, et al.
Published: (2025)
Similar Items
-
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
by: Zhang, Hao, et al.
Published: (2026) -
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
by: Zhang, Qiang, et al.
Published: (2026) -
AgentV-RL: Scaling Reward Modeling with Agentic Verifier
by: Zhang, Jiazheng, et al.
Published: (2026) -
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
by: Cao, Shiyi, et al.
Published: (2025) -
Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?
by: Chen, Wanyi, et al.
Published: (2026)