:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bahloul, Ahmed, Malberg, Simon
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2507.13142
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Comprehensive Evaluation of Cognitive Biases in LLMs
by: Malberg, Simon, et al.
Published: (2024)

Framework of Thoughts: A Foundation Framework for Dynamic and Optimized Reasoning based on Chains, Trees, and Graphs
by: Fricke, Felix, et al.
Published: (2026)

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)

Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
by: Zhang, Kongcheng, et al.
Published: (2025)

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)

Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
by: Ye, Zhiling, et al.
Published: (2025)

Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals
by: Chen, Sirui, et al.
Published: (2026)

Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment
by: Cheng, Ruoxi, et al.
Published: (2025)

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards
by: Tian, Wei, et al.
Published: (2026)

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
by: Ma, Zhengzhao, et al.
Published: (2026)

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
by: Liao, Jianxing, et al.
Published: (2025)

Enhancing LLM Reasoning with Reward-guided Tree Search
by: Jiang, Jinhao, et al.
Published: (2024)

Reinforcement Learning with Conditional Expectation Reward
by: Xiao, Changyi, et al.
Published: (2026)

A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning
by: Wachi, Akifumi, et al.
Published: (2026)

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
by: Wu, Fang, et al.
Published: (2025)

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
by: Shen, Wei, et al.
Published: (2024)

Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
by: Jiao, Fangkai, et al.
Published: (2024)

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
by: Liang, Tian, et al.
Published: (2025)

Exploring Reasoning Reward Model for Agents
by: Fan, Kaixuan, et al.
Published: (2026)

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)

From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning
by: Tahmasbi, Amir, et al.
Published: (2025)

From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning
by: Zhang, Shihao, et al.
Published: (2026)

Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
by: Xie, Tianbao, et al.
Published: (2023)

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)

Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Generation
by: Liu, David Y., et al.
Published: (2026)

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
by: Lara, Luis, et al.
Published: (2026)

The Art of Efficient Reasoning: Data, Reward, and Optimization
by: Wu, Taiqiang, et al.
Published: (2026)

Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
by: Liu, Yule, et al.
Published: (2025)

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
by: Gunjal, Anisha, et al.
Published: (2025)

Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning
by: Chen, Zhuoen, et al.
Published: (2026)

From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
by: Mishra, Shubhra, et al.
Published: (2024)

ReCode: Reinforcing Code Generation with Reasoning-Process Rewards
by: Fan, Lishui, et al.
Published: (2025)

Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
by: Hong, Haitao, et al.
Published: (2025)

Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
by: Cao, Meng, et al.
Published: (2024)

Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
by: Deng, Yihe, et al.
Published: (2025)

ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models
by: Chen, Bin, et al.
Published: (2025)

Reward-Guided Speculative Decoding for Efficient LLM Reasoning
by: Liao, Baohao, et al.
Published: (2025)

PACR: Progressively Ascending Confidence Reward for LLM Reasoning
by: Yoon, Eunseop, et al.
Published: (2025)

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)