Saved in:
| Main Authors: | Hu, Tianyi, Fu, Qingxu, Chen, Yanxi, Liu, Zhaoyang, Ding, Bolin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06554 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Voting with the Graph: Stable RLAIF via Topological Consistency Maximization
by: Liu, Boyin, et al.
Published: (2025)
by: Liu, Boyin, et al.
Published: (2025)
CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL
by: Mai, Shinji, et al.
Published: (2025)
by: Mai, Shinji, et al.
Published: (2025)
Unreal-MAP: Unreal-Engine-Based General Platform for Multi-Agent Reinforcement Learning
by: Hu, Tianyi, et al.
Published: (2025)
by: Hu, Tianyi, et al.
Published: (2025)
Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights
by: Chen, Yanxi, et al.
Published: (2024)
by: Chen, Yanxi, et al.
Published: (2024)
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
by: Zhang, Wenhao, et al.
Published: (2025)
by: Zhang, Wenhao, et al.
Published: (2025)
Provable Scaling Laws for the Test-Time Compute of Large Language Models
by: Chen, Yanxi, et al.
Published: (2024)
by: Chen, Yanxi, et al.
Published: (2024)
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
by: Pan, Xuchen, et al.
Published: (2024)
by: Pan, Xuchen, et al.
Published: (2024)
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
by: Chen, Yanxi, et al.
Published: (2023)
by: Chen, Yanxi, et al.
Published: (2023)
AgentEvolver: Towards Efficient Self-Evolving Agent System
by: Zhai, Yunpeng, et al.
Published: (2025)
by: Zhai, Yunpeng, et al.
Published: (2025)
Prioritized League Reinforcement Learning for Large-Scale Heterogeneous Multiagent Systems
by: Fu, Qingxu, et al.
Published: (2024)
by: Fu, Qingxu, et al.
Published: (2024)
Clip Your Sequences Fairly: Enforcing Length Fairness for Sequence-Level RL
by: Mao, Hanyi, et al.
Published: (2025)
by: Mao, Hanyi, et al.
Published: (2025)
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
by: Wang, Tianyi, et al.
Published: (2026)
by: Wang, Tianyi, et al.
Published: (2026)
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
by: Yao, Chaorui, et al.
Published: (2025)
by: Yao, Chaorui, et al.
Published: (2025)
MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization
by: Yang, Zhuo, et al.
Published: (2025)
by: Yang, Zhuo, et al.
Published: (2025)
Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph
by: Fu, Qingxu, et al.
Published: (2024)
by: Fu, Qingxu, et al.
Published: (2024)
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
by: Cen, Zhepeng, et al.
Published: (2025)
by: Cen, Zhepeng, et al.
Published: (2025)
Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?
by: Chen, Wanyi, et al.
Published: (2026)
by: Chen, Wanyi, et al.
Published: (2026)
Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution
by: Cao, Zouying, et al.
Published: (2025)
by: Cao, Zouying, et al.
Published: (2025)
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
AgentV-RL: Scaling Reward Modeling with Agentic Verifier
by: Zhang, Jiazheng, et al.
Published: (2026)
by: Zhang, Jiazheng, et al.
Published: (2026)
VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL
by: Hu, Zengjie, et al.
Published: (2025)
by: Hu, Zengjie, et al.
Published: (2025)
Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework
by: Zhi, Zhuo, et al.
Published: (2025)
by: Zhi, Zhuo, et al.
Published: (2025)
SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
by: Wang, Prince Zizhuang, et al.
Published: (2026)
by: Wang, Prince Zizhuang, et al.
Published: (2026)
Skill Reuse as Compression in Agentic RL
by: Xu, Zhikun, et al.
Published: (2026)
by: Xu, Zhikun, et al.
Published: (2026)
E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning
by: Zhang, Lingzhe, et al.
Published: (2026)
by: Zhang, Lingzhe, et al.
Published: (2026)
The Hierarchy of Agentic Capabilities: Evaluating Frontier Models on Realistic RL Environments
by: Ritchie, Logan, et al.
Published: (2026)
by: Ritchie, Logan, et al.
Published: (2026)
On Effectiveness and Efficiency of Agentic Tool-calling and RL Training
by: Liu, Tong, et al.
Published: (2026)
by: Liu, Tong, et al.
Published: (2026)
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
by: Ou, Jingyang, et al.
Published: (2025)
by: Ou, Jingyang, et al.
Published: (2025)
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent
by: Li, Wanli, et al.
Published: (2026)
by: Li, Wanli, et al.
Published: (2026)
PyVision-RL: Forging Open Agentic Vision Models via RL
by: Zhao, Shitian, et al.
Published: (2026)
by: Zhao, Shitian, et al.
Published: (2026)
Auto-Rubric: Learning From Implicit Weights to Explicit Rubrics for Reward Modeling
by: Xie, Lipeng, et al.
Published: (2025)
by: Xie, Lipeng, et al.
Published: (2025)
Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification
by: Li, Yanxi, et al.
Published: (2025)
by: Li, Yanxi, et al.
Published: (2025)
ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning
by: Gao, Jingyue, et al.
Published: (2026)
by: Gao, Jingyue, et al.
Published: (2026)
Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees
by: Kim, Dohyeong, et al.
Published: (2024)
by: Kim, Dohyeong, et al.
Published: (2024)
Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries
by: Ganesh, Swetha, et al.
Published: (2024)
by: Ganesh, Swetha, et al.
Published: (2024)
The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?
by: Zhang, Zhaoyang, et al.
Published: (2026)
by: Zhang, Zhaoyang, et al.
Published: (2026)
Speaking at the Right Level: Literacy-Controlled Counterspeech Generation with RAG-RL
by: Song, Xiaoying, et al.
Published: (2025)
by: Song, Xiaoying, et al.
Published: (2025)
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
by: Gao, Jiaxuan, et al.
Published: (2025)
by: Gao, Jiaxuan, et al.
Published: (2025)
See it. Say it. Sorted: Agentic System for Compositional Diagram Generation
by: Zhang, Hantao, et al.
Published: (2025)
by: Zhang, Hantao, et al.
Published: (2025)
Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
Similar Items
-
Voting with the Graph: Stable RLAIF via Topological Consistency Maximization
by: Liu, Boyin, et al.
Published: (2025) -
CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL
by: Mai, Shinji, et al.
Published: (2025) -
Unreal-MAP: Unreal-Engine-Based General Platform for Multi-Agent Reinforcement Learning
by: Hu, Tianyi, et al.
Published: (2025) -
Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights
by: Chen, Yanxi, et al.
Published: (2024) -
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
by: Zhang, Wenhao, et al.
Published: (2025)