Saved in:
| Main Authors: | Bai, Bizhe, Wang, Xinyue, Ye, Peng, Chen, Tao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02555 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space
by: Mangannavar, Rajesh, et al.
Published: (2024)
by: Mangannavar, Rajesh, et al.
Published: (2024)
Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)
by: Cai, Xin-Qiang, et al.
Published: (2025)
M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization
by: Bai, Bizhe, et al.
Published: (2025)
by: Bai, Bizhe, et al.
Published: (2025)
Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces
by: Kar, Avik, et al.
Published: (2024)
by: Kar, Avik, et al.
Published: (2024)
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
by: Gunjal, Anisha, et al.
Published: (2025)
by: Gunjal, Anisha, et al.
Published: (2025)
Reward Models in Deep Reinforcement Learning: A Survey
by: Yu, Rui, et al.
Published: (2025)
by: Yu, Rui, et al.
Published: (2025)
Symmetry in Neural Network Parameter Spaces
by: Zhao, Bo, et al.
Published: (2025)
by: Zhao, Bo, et al.
Published: (2025)
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
by: Liu, Shuze Daniel, et al.
Published: (2024)
by: Liu, Shuze Daniel, et al.
Published: (2024)
Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions
by: de la Rosa, Raul, et al.
Published: (2026)
by: de la Rosa, Raul, et al.
Published: (2026)
Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards
by: Yoon, Deokgyu, et al.
Published: (2026)
by: Yoon, Deokgyu, et al.
Published: (2026)
DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
by: Hu, Haoyu, et al.
Published: (2026)
by: Hu, Haoyu, et al.
Published: (2026)
Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective
by: Zhang, Feng, et al.
Published: (2026)
by: Zhang, Feng, et al.
Published: (2026)
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
by: Zhang, Zijing, et al.
Published: (2025)
by: Zhang, Zijing, et al.
Published: (2025)
Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards
by: Wang, Zhen, et al.
Published: (2025)
by: Wang, Zhen, et al.
Published: (2025)
Offline Reinforcement Learning with Penalized Action Noise Injection
by: Oh, JunHyeok, et al.
Published: (2025)
by: Oh, JunHyeok, et al.
Published: (2025)
Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space
by: Liu, Qianmei, et al.
Published: (2024)
by: Liu, Qianmei, et al.
Published: (2024)
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
by: Lee, Hojoon, et al.
Published: (2024)
by: Lee, Hojoon, et al.
Published: (2024)
Skill Expansion and Composition in Parameter Space
by: Liu, Tenglong, et al.
Published: (2025)
by: Liu, Tenglong, et al.
Published: (2025)
ParamsDrag: Interactive Parameter Space Exploration via Image-Space Dragging
by: Li, Guan, et al.
Published: (2024)
by: Li, Guan, et al.
Published: (2024)
Reimagining Parameter Space Exploration with Diffusion Models
by: Zhang, Lijun, et al.
Published: (2025)
by: Zhang, Lijun, et al.
Published: (2025)
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)
by: Nguyen, Hieu Trung, et al.
Published: (2026)
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)
by: Stojanovski, Zafir, et al.
Published: (2025)
A Meta-Level Learning Algorithm for Sequential Hyper-Parameter Space Reduction in AutoML
by: Borboudakis, Giorgos, et al.
Published: (2023)
by: Borboudakis, Giorgos, et al.
Published: (2023)
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
by: Li, Long, et al.
Published: (2025)
by: Li, Long, et al.
Published: (2025)
Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
by: Shin, Yongjae, et al.
Published: (2026)
by: Shin, Yongjae, et al.
Published: (2026)
Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning
by: Li, Yuxuan, et al.
Published: (2026)
by: Li, Yuxuan, et al.
Published: (2026)
Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation
by: Wang, Longwen, et al.
Published: (2026)
by: Wang, Longwen, et al.
Published: (2026)
Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025)
by: Cho, Dongkyu Derek, et al.
Published: (2025)
Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation
by: Li, Xin-Ye, et al.
Published: (2026)
by: Li, Xin-Ye, et al.
Published: (2026)
Selector-Guided Autonomous Curriculum for One-Shot Reinforcement Learning from Verifiable Rewards
by: Dave, Rudray, et al.
Published: (2026)
by: Dave, Rudray, et al.
Published: (2026)
Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design
by: Choi, Jaemoo, et al.
Published: (2026)
by: Choi, Jaemoo, et al.
Published: (2026)
From Parameters to Behaviors: Unsupervised Compression of the Policy Space
by: Tenedini, Davide, et al.
Published: (2025)
by: Tenedini, Davide, et al.
Published: (2025)
Multi-Task Reinforcement Learning Enables Parameter Scaling
by: McLean, Reginald, et al.
Published: (2025)
by: McLean, Reginald, et al.
Published: (2025)
DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay
by: Li, Long, et al.
Published: (2026)
by: Li, Long, et al.
Published: (2026)
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)
by: Ackermann, Johannes, et al.
Published: (2026)
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
by: Yan, Kai, et al.
Published: (2026)
by: Yan, Kai, et al.
Published: (2026)
Gradient-Free Noise Optimization for Reward Alignment in Generative Models
by: Kim, Jeongsol, et al.
Published: (2026)
by: Kim, Jeongsol, et al.
Published: (2026)
Contrastive Learning with Nasty Noise
by: Zhao, Ziruo
Published: (2025)
by: Zhao, Ziruo
Published: (2025)
Similar Items
-
Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space
by: Mangannavar, Rajesh, et al.
Published: (2024) -
Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space
by: Zhang, Xinyu, et al.
Published: (2025) -
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026) -
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025) -
M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization
by: Bai, Bizhe, et al.
Published: (2025)