Saved in:
| Main Authors: | Krishnan, Rohit, Evans, Jon |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.12165 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)
by: Cai, Xin-Qiang, et al.
Published: (2025)
Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025)
by: Cho, Dongkyu Derek, et al.
Published: (2025)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)
by: Liu, Xiaoyuan, et al.
Published: (2025)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective
by: Zhang, Feng, et al.
Published: (2026)
by: Zhang, Feng, et al.
Published: (2026)
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
by: Liu, Yule, et al.
Published: (2025)
by: Liu, Yule, et al.
Published: (2025)
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
by: Gunjal, Anisha, et al.
Published: (2025)
by: Gunjal, Anisha, et al.
Published: (2025)
Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards
by: Yoon, Deokgyu, et al.
Published: (2026)
by: Yoon, Deokgyu, et al.
Published: (2026)
DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
by: Hu, Haoyu, et al.
Published: (2026)
by: Hu, Haoyu, et al.
Published: (2026)
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)
by: Wang, Peisong, et al.
Published: (2025)
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)
by: Stojanovski, Zafir, et al.
Published: (2025)
Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards
by: Wang, Zhen, et al.
Published: (2025)
by: Wang, Zhen, et al.
Published: (2025)
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)
by: Nguyen, Hieu Trung, et al.
Published: (2026)
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)
by: Ackermann, Johannes, et al.
Published: (2026)
Agentic Reinforcement Learning for Real-World Code Repair
by: Zhu, Siyu, et al.
Published: (2025)
by: Zhu, Siyu, et al.
Published: (2025)
FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards
by: Han, Zhixin, et al.
Published: (2026)
by: Han, Zhixin, et al.
Published: (2026)
Selector-Guided Autonomous Curriculum for One-Shot Reinforcement Learning from Verifiable Rewards
by: Dave, Rudray, et al.
Published: (2026)
by: Dave, Rudray, et al.
Published: (2026)
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)
by: Wen, Xumeng, et al.
Published: (2025)
Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
by: Lara, Luis, et al.
Published: (2026)
by: Lara, Luis, et al.
Published: (2026)
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
by: Ma, Zhengzhao, et al.
Published: (2026)
by: Ma, Zhengzhao, et al.
Published: (2026)
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
by: Zhang, Zijing, et al.
Published: (2025)
by: Zhang, Zijing, et al.
Published: (2025)
Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards
by: Rad, Ali, et al.
Published: (2026)
by: Rad, Ali, et al.
Published: (2026)
Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards
by: Guo, Kai-Yuan, et al.
Published: (2026)
by: Guo, Kai-Yuan, et al.
Published: (2026)
Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
by: Liu, Shuze Daniel, et al.
Published: (2026)
by: Liu, Shuze Daniel, et al.
Published: (2026)
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
by: Li, Long, et al.
Published: (2025)
by: Li, Long, et al.
Published: (2025)
Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation
by: Zhou, Jiang, et al.
Published: (2026)
by: Zhou, Jiang, et al.
Published: (2026)
Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation
by: Wang, Longwen, et al.
Published: (2026)
by: Wang, Longwen, et al.
Published: (2026)
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
by: Yan, Kai, et al.
Published: (2026)
by: Yan, Kai, et al.
Published: (2026)
Verifiable Process Rewards for Agentic Reasoning
by: Yuan, Huining, et al.
Published: (2026)
by: Yuan, Huining, et al.
Published: (2026)
MarketBench: Evaluating AI Agents as Market Participants
by: Fradkin, Andrey, et al.
Published: (2026)
by: Fradkin, Andrey, et al.
Published: (2026)
An Imperfect Verifier is Good Enough: Learning with Noisy Rewards
by: Plesner, Andreas, et al.
Published: (2026)
by: Plesner, Andreas, et al.
Published: (2026)
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
by: Wu, Fang, et al.
Published: (2025)
by: Wu, Fang, et al.
Published: (2025)
DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay
by: Li, Long, et al.
Published: (2026)
by: Li, Long, et al.
Published: (2026)
Reward Hacking Mitigation using Verifiable Composite Rewards
by: Tarek, Mirza Farhan Bin, et al.
Published: (2025)
by: Tarek, Mirza Farhan Bin, et al.
Published: (2025)
A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning
by: Wachi, Akifumi, et al.
Published: (2026)
by: Wachi, Akifumi, et al.
Published: (2026)
Learning to Explore with Parameter-Space Noise: A Deep Dive into Parameter-Space Noise for Reinforcement Learning with Verifiable Rewards
by: Bai, Bizhe, et al.
Published: (2026)
by: Bai, Bizhe, et al.
Published: (2026)
Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)
by: Yue, Chuhuai, et al.
Published: (2025)
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
by: Shen, Yiran, et al.
Published: (2025)
by: Shen, Yiran, et al.
Published: (2025)
SWE-Universe: Scale Real-World Verifiable Environments to Millions
by: Chen, Mouxiang, et al.
Published: (2026)
by: Chen, Mouxiang, et al.
Published: (2026)
The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
by: Huang, Yu, et al.
Published: (2026)
by: Huang, Yu, et al.
Published: (2026)
Similar Items
-
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025) -
Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025) -
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025) -
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026) -
Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective
by: Zhang, Feng, et al.
Published: (2026)