:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bai, Bizhe, Wang, Xinyue, Ye, Peng, Chen, Tao
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.02555
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space
by: Mangannavar, Rajesh, et al.
Published: (2024)

Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space
by: Zhang, Xinyu, et al.
Published: (2025)

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)

Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)

M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization
by: Bai, Bizhe, et al.
Published: (2025)

Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces
by: Kar, Avik, et al.
Published: (2024)

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
by: Gunjal, Anisha, et al.
Published: (2025)

Reward Models in Deep Reinforcement Learning: A Survey
by: Yu, Rui, et al.
Published: (2025)

Symmetry in Neural Network Parameter Spaces
by: Zhao, Bo, et al.
Published: (2025)

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
by: Liu, Shuze Daniel, et al.
Published: (2024)

Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions
by: de la Rosa, Raul, et al.
Published: (2026)

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards
by: Yoon, Deokgyu, et al.
Published: (2026)

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
by: Hu, Haoyu, et al.
Published: (2026)

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective
by: Zhang, Feng, et al.
Published: (2026)

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
by: Zhang, Zijing, et al.
Published: (2025)

Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards
by: Wang, Zhen, et al.
Published: (2025)

Offline Reinforcement Learning with Penalized Action Noise Injection
by: Oh, JunHyeok, et al.
Published: (2025)

Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space
by: Liu, Qianmei, et al.
Published: (2024)

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
by: Lee, Hojoon, et al.
Published: (2024)

Skill Expansion and Composition in Parameter Space
by: Liu, Tenglong, et al.
Published: (2025)

ParamsDrag: Interactive Parameter Space Exploration via Image-Space Dragging
by: Li, Guan, et al.
Published: (2024)

Reimagining Parameter Space Exploration with Diffusion Models
by: Zhang, Lijun, et al.
Published: (2025)

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)

A Meta-Level Learning Algorithm for Sequential Hyper-Parameter Space Reduction in AutoML
by: Borboudakis, Giorgos, et al.
Published: (2023)

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
by: Li, Long, et al.
Published: (2025)

Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
by: Shin, Yongjae, et al.
Published: (2026)

Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning
by: Li, Yuxuan, et al.
Published: (2026)

Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation
by: Wang, Longwen, et al.
Published: (2026)

Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025)

Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation
by: Li, Xin-Ye, et al.
Published: (2026)

Selector-Guided Autonomous Curriculum for One-Shot Reinforcement Learning from Verifiable Rewards
by: Dave, Rudray, et al.
Published: (2026)

Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design
by: Choi, Jaemoo, et al.
Published: (2026)

From Parameters to Behaviors: Unsupervised Compression of the Policy Space
by: Tenedini, Davide, et al.
Published: (2025)

Multi-Task Reinforcement Learning Enables Parameter Scaling
by: McLean, Reginald, et al.
Published: (2025)

DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay
by: Li, Long, et al.
Published: (2026)

Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
by: Yan, Kai, et al.
Published: (2026)

Gradient-Free Noise Optimization for Reward Alignment in Generative Models
by: Kim, Jeongsol, et al.
Published: (2026)

Contrastive Learning with Nasty Noise
by: Zhao, Ziruo
Published: (2025)