Gardado en:
| Main Authors: | Gao, Jiaxuan, Chen, Jiaao, He, Chuyi, Xu, Shusheng, Jin, Di, Wu, Yi |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Subjects: | |
| Acceso en liña: | https://arxiv.org/abs/2601.22607 |
| Tags: |
Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!
|
Títulos similares
On Designing Effective RL Reward at Training Time for LLM Reasoning
por: Gao, Jiaxuan, et al.
Publicado: (2024)
por: Gao, Jiaxuan, et al.
Publicado: (2024)
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
por: Gao, Jiaxuan, et al.
Publicado: (2025)
por: Gao, Jiaxuan, et al.
Publicado: (2025)
EigenData: A Self-Evolving Multi-Agent Platform for Function-Calling Data Synthesis, Auditing, and Repair
por: Chen, Jiaao, et al.
Publicado: (2026)
por: Chen, Jiaao, et al.
Publicado: (2026)
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
por: Cao, Shiyi, et al.
Publicado: (2025)
por: Cao, Shiyi, et al.
Publicado: (2025)
Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards
por: Li, Ming, et al.
Publicado: (2025)
por: Li, Ming, et al.
Publicado: (2025)
Toward Scalable Verifiable Reward: Proxy State-Based Evaluation for Multi-turn Tool-Calling LLM Agents
por: Chuang, Yun-Shiuan, et al.
Publicado: (2026)
por: Chuang, Yun-Shiuan, et al.
Publicado: (2026)
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
por: Li, Pengxiang, et al.
Publicado: (2025)
por: Li, Pengxiang, et al.
Publicado: (2025)
RewardHarness: Self-Evolving Agentic Post-Training
por: Zhang, Yuxuan, et al.
Publicado: (2026)
por: Zhang, Yuxuan, et al.
Publicado: (2026)
Debate as Reward: A Multi-Agent Reward System for Scientific Ideation via RL Post-Training
por: Salimi, Moein, et al.
Publicado: (2026)
por: Salimi, Moein, et al.
Publicado: (2026)
AgentV-RL: Scaling Reward Modeling with Agentic Verifier
por: Zhang, Jiazheng, et al.
Publicado: (2026)
por: Zhang, Jiazheng, et al.
Publicado: (2026)
Verifiable Process Rewards for Agentic Reasoning
por: Yuan, Huining, et al.
Publicado: (2026)
por: Yuan, Huining, et al.
Publicado: (2026)
Diagnosing and Mitigating System Bias in Self-Rewarding RL
por: Tan, Chuyi, et al.
Publicado: (2025)
por: Tan, Chuyi, et al.
Publicado: (2025)
Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards
por: Normann, Philipp, et al.
Publicado: (2026)
por: Normann, Philipp, et al.
Publicado: (2026)
EVE-Agent: Evidence-Verifiable Self-Evolving Agents
por: Arai, Yamato, et al.
Publicado: (2026)
por: Arai, Yamato, et al.
Publicado: (2026)
GMTRouter: Personalized LLM Router over Multi-turn User Interactions
por: Xie, Encheng, et al.
Publicado: (2025)
por: Xie, Encheng, et al.
Publicado: (2025)
Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards
por: Pavlenko, Kirill, et al.
Publicado: (2026)
por: Pavlenko, Kirill, et al.
Publicado: (2026)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
por: Su, Yi, et al.
Publicado: (2025)
por: Su, Yi, et al.
Publicado: (2025)
Quantum Verifiable Rewards for Post-Training Qiskit Code Assistant
por: Dupuis, Nicolas, et al.
Publicado: (2025)
por: Dupuis, Nicolas, et al.
Publicado: (2025)
GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
por: Cao, Yuan, et al.
Publicado: (2026)
por: Cao, Yuan, et al.
Publicado: (2026)
Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents
por: Fan, Zhiyuan, et al.
Publicado: (2026)
por: Fan, Zhiyuan, et al.
Publicado: (2026)
ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning
por: Tang, Lingxiao, et al.
Publicado: (2026)
por: Tang, Lingxiao, et al.
Publicado: (2026)
ToolRL: Reward is All Tool Learning Needs
por: Qian, Cheng, et al.
Publicado: (2025)
por: Qian, Cheng, et al.
Publicado: (2025)
SEVerA: Verified Synthesis of Self-Evolving Agents
por: Banerjee, Debangshu, et al.
Publicado: (2026)
por: Banerjee, Debangshu, et al.
Publicado: (2026)
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
por: Goldie, Anna, et al.
Publicado: (2025)
por: Goldie, Anna, et al.
Publicado: (2025)
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
por: Shi, Yucheng, et al.
Publicado: (2026)
por: Shi, Yucheng, et al.
Publicado: (2026)
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
por: Chen, Yixing, et al.
Publicado: (2025)
por: Chen, Yixing, et al.
Publicado: (2025)
LAGOON: Language-Guided Motion Control
por: Xu, Shusheng, et al.
Publicado: (2023)
por: Xu, Shusheng, et al.
Publicado: (2023)
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation
por: Chen, Sixiang, et al.
Publicado: (2026)
por: Chen, Sixiang, et al.
Publicado: (2026)
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
por: Xue, Xiangyuan, et al.
Publicado: (2025)
por: Xue, Xiangyuan, et al.
Publicado: (2025)
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
por: Fu, Wei, et al.
Publicado: (2025)
por: Fu, Wei, et al.
Publicado: (2025)
The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
por: Huang, Yu, et al.
Publicado: (2026)
por: Huang, Yu, et al.
Publicado: (2026)
Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC
por: Yu, Cunxi, et al.
Publicado: (2026)
por: Yu, Cunxi, et al.
Publicado: (2026)
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
por: Zhang, Hao, et al.
Publicado: (2026)
por: Zhang, Hao, et al.
Publicado: (2026)
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
por: Yang, Rui, et al.
Publicado: (2026)
por: Yang, Rui, et al.
Publicado: (2026)
PRISM: A Unified Framework for Post-Training LLMs Without Verifiable Rewards
por: Ghimire, Mukesh, et al.
Publicado: (2026)
por: Ghimire, Mukesh, et al.
Publicado: (2026)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
por: Liu, Xiaoyuan, et al.
Publicado: (2025)
por: Liu, Xiaoyuan, et al.
Publicado: (2025)
Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions
por: Almansoori, Mohammad, et al.
Publicado: (2025)
por: Almansoori, Mohammad, et al.
Publicado: (2025)
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
por: Qi, Zehan, et al.
Publicado: (2024)
por: Qi, Zehan, et al.
Publicado: (2024)
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data
por: Acikgoz, Emre Can, et al.
Publicado: (2026)
por: Acikgoz, Emre Can, et al.
Publicado: (2026)
Sparse Rewards Can Self-Train Dialogue Agents
por: Lattimer, Barrett Martin, et al.
Publicado: (2024)
por: Lattimer, Barrett Martin, et al.
Publicado: (2024)
Títulos similares
-
On Designing Effective RL Reward at Training Time for LLM Reasoning
por: Gao, Jiaxuan, et al.
Publicado: (2024) -
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
por: Gao, Jiaxuan, et al.
Publicado: (2025) -
EigenData: A Self-Evolving Multi-Agent Platform for Function-Calling Data Synthesis, Auditing, and Repair
por: Chen, Jiaao, et al.
Publicado: (2026) -
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
por: Cao, Shiyi, et al.
Publicado: (2025) -
Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards
por: Li, Ming, et al.
Publicado: (2025)