Saved in:
| Main Authors: | Bahloul, Ahmed, Malberg, Simon |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.13142 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Comprehensive Evaluation of Cognitive Biases in LLMs
by: Malberg, Simon, et al.
Published: (2024)
by: Malberg, Simon, et al.
Published: (2024)
Framework of Thoughts: A Foundation Framework for Dynamic and Optimized Reasoning based on Chains, Trees, and Graphs
by: Fricke, Felix, et al.
Published: (2026)
by: Fricke, Felix, et al.
Published: (2026)
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)
by: Wen, Xumeng, et al.
Published: (2025)
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
by: Zhang, Kongcheng, et al.
Published: (2025)
by: Zhang, Kongcheng, et al.
Published: (2025)
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)
by: Stojanovski, Zafir, et al.
Published: (2025)
Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
by: Ye, Zhiling, et al.
Published: (2025)
by: Ye, Zhiling, et al.
Published: (2025)
Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals
by: Chen, Sirui, et al.
Published: (2026)
by: Chen, Sirui, et al.
Published: (2026)
Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment
by: Cheng, Ruoxi, et al.
Published: (2025)
by: Cheng, Ruoxi, et al.
Published: (2025)
CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards
by: Tian, Wei, et al.
Published: (2026)
by: Tian, Wei, et al.
Published: (2026)
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
by: Ma, Zhengzhao, et al.
Published: (2026)
by: Ma, Zhengzhao, et al.
Published: (2026)
RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
by: Liao, Jianxing, et al.
Published: (2025)
by: Liao, Jianxing, et al.
Published: (2025)
Enhancing LLM Reasoning with Reward-guided Tree Search
by: Jiang, Jinhao, et al.
Published: (2024)
by: Jiang, Jinhao, et al.
Published: (2024)
Reinforcement Learning with Conditional Expectation Reward
by: Xiao, Changyi, et al.
Published: (2026)
by: Xiao, Changyi, et al.
Published: (2026)
A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning
by: Wachi, Akifumi, et al.
Published: (2026)
by: Wachi, Akifumi, et al.
Published: (2026)
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
by: Wu, Fang, et al.
Published: (2025)
by: Wu, Fang, et al.
Published: (2025)
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
by: Shen, Wei, et al.
Published: (2024)
by: Shen, Wei, et al.
Published: (2024)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
by: Jiao, Fangkai, et al.
Published: (2024)
by: Jiao, Fangkai, et al.
Published: (2024)
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
by: Liang, Tian, et al.
Published: (2025)
by: Liang, Tian, et al.
Published: (2025)
Exploring Reasoning Reward Model for Agents
by: Fan, Kaixuan, et al.
Published: (2026)
by: Fan, Kaixuan, et al.
Published: (2026)
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)
by: Wang, Peisong, et al.
Published: (2025)
From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning
by: Tahmasbi, Amir, et al.
Published: (2025)
by: Tahmasbi, Amir, et al.
Published: (2025)
From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning
by: Zhang, Shihao, et al.
Published: (2026)
by: Zhang, Shihao, et al.
Published: (2026)
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
by: Xie, Tianbao, et al.
Published: (2023)
by: Xie, Tianbao, et al.
Published: (2023)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)
by: Liu, Xiaoyuan, et al.
Published: (2025)
Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Generation
by: Liu, David Y., et al.
Published: (2026)
by: Liu, David Y., et al.
Published: (2026)
Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
by: Lara, Luis, et al.
Published: (2026)
by: Lara, Luis, et al.
Published: (2026)
The Art of Efficient Reasoning: Data, Reward, and Optimization
by: Wu, Taiqiang, et al.
Published: (2026)
by: Wu, Taiqiang, et al.
Published: (2026)
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
by: Liu, Yule, et al.
Published: (2025)
by: Liu, Yule, et al.
Published: (2025)
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
by: Gunjal, Anisha, et al.
Published: (2025)
by: Gunjal, Anisha, et al.
Published: (2025)
Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning
by: Chen, Zhuoen, et al.
Published: (2026)
by: Chen, Zhuoen, et al.
Published: (2026)
From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
by: Mishra, Shubhra, et al.
Published: (2024)
by: Mishra, Shubhra, et al.
Published: (2024)
ReCode: Reinforcing Code Generation with Reasoning-Process Rewards
by: Fan, Lishui, et al.
Published: (2025)
by: Fan, Lishui, et al.
Published: (2025)
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
by: Hong, Haitao, et al.
Published: (2025)
by: Hong, Haitao, et al.
Published: (2025)
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
by: Cao, Meng, et al.
Published: (2024)
by: Cao, Meng, et al.
Published: (2024)
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)
by: Ackermann, Johannes, et al.
Published: (2026)
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
by: Deng, Yihe, et al.
Published: (2025)
by: Deng, Yihe, et al.
Published: (2025)
ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models
by: Chen, Bin, et al.
Published: (2025)
by: Chen, Bin, et al.
Published: (2025)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
by: Liao, Baohao, et al.
Published: (2025)
by: Liao, Baohao, et al.
Published: (2025)
PACR: Progressively Ascending Confidence Reward for LLM Reasoning
by: Yoon, Eunseop, et al.
Published: (2025)
by: Yoon, Eunseop, et al.
Published: (2025)
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)
by: Nguyen, Hieu Trung, et al.
Published: (2026)
Similar Items
-
A Comprehensive Evaluation of Cognitive Biases in LLMs
by: Malberg, Simon, et al.
Published: (2024) -
Framework of Thoughts: A Foundation Framework for Dynamic and Optimized Reasoning based on Chains, Trees, and Graphs
by: Fricke, Felix, et al.
Published: (2026) -
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025) -
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
by: Zhang, Kongcheng, et al.
Published: (2025) -
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)