:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Krishnan, Rohit, Evans, Jon
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.12165
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)

Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025)

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective
by: Zhang, Feng, et al.
Published: (2026)

Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
by: Liu, Yule, et al.
Published: (2025)

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
by: Gunjal, Anisha, et al.
Published: (2025)

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards
by: Yoon, Deokgyu, et al.
Published: (2026)

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
by: Hu, Haoyu, et al.
Published: (2026)

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)

Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards
by: Wang, Zhen, et al.
Published: (2025)

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)

Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)

Agentic Reinforcement Learning for Real-World Code Repair
by: Zhu, Siyu, et al.
Published: (2025)

FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards
by: Han, Zhixin, et al.
Published: (2026)

Selector-Guided Autonomous Curriculum for One-Shot Reinforcement Learning from Verifiable Rewards
by: Dave, Rudray, et al.
Published: (2026)

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
by: Lara, Luis, et al.
Published: (2026)

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
by: Ma, Zhengzhao, et al.
Published: (2026)

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
by: Zhang, Zijing, et al.
Published: (2025)

Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards
by: Rad, Ali, et al.
Published: (2026)

Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards
by: Guo, Kai-Yuan, et al.
Published: (2026)

Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
by: Liu, Shuze Daniel, et al.
Published: (2026)

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
by: Li, Long, et al.
Published: (2025)

Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation
by: Zhou, Jiang, et al.
Published: (2026)

Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation
by: Wang, Longwen, et al.
Published: (2026)

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
by: Yan, Kai, et al.
Published: (2026)

Verifiable Process Rewards for Agentic Reasoning
by: Yuan, Huining, et al.
Published: (2026)

MarketBench: Evaluating AI Agents as Market Participants
by: Fradkin, Andrey, et al.
Published: (2026)

An Imperfect Verifier is Good Enough: Learning with Noisy Rewards
by: Plesner, Andreas, et al.
Published: (2026)

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
by: Wu, Fang, et al.
Published: (2025)

DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay
by: Li, Long, et al.
Published: (2026)

Reward Hacking Mitigation using Verifiable Composite Rewards
by: Tarek, Mirza Farhan Bin, et al.
Published: (2025)

A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning
by: Wachi, Akifumi, et al.
Published: (2026)

Learning to Explore with Parameter-Space Noise: A Deep Dive into Parameter-Space Noise for Reinforcement Learning with Verifiable Rewards
by: Bai, Bizhe, et al.
Published: (2026)

Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)

Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
by: Shen, Yiran, et al.
Published: (2025)

SWE-Universe: Scale Real-World Verifiable Environments to Millions
by: Chen, Mouxiang, et al.
Published: (2026)

The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
by: Huang, Yu, et al.
Published: (2026)