Saved in:
| Main Authors: | Liao, Mengqi, Xi, Xiangyu, Chen, Ruinian, Leng, Jia, Hu, Yangen, Zeng, Ke, Liu, Shuai, Wan, Huaiyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.18573 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning
by: Qin, Zhanyue, et al.
Published: (2026)
by: Qin, Zhanyue, et al.
Published: (2026)
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
by: Gu, Tianteng, et al.
Published: (2025)
by: Gu, Tianteng, et al.
Published: (2025)
Reinforcement Learning Enhanced LLMs: A Survey
by: Wang, Shuhe, et al.
Published: (2024)
by: Wang, Shuhe, et al.
Published: (2024)
Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
by: Deng, Wenhao, et al.
Published: (2025)
by: Deng, Wenhao, et al.
Published: (2025)
Optimizing Large Language Models with an Enhanced LoRA Fine-Tuning Algorithm for Efficiency and Robustness in NLP Tasks
by: Hu, Jiacheng, et al.
Published: (2024)
by: Hu, Jiacheng, et al.
Published: (2024)
When Bias Pretends to Be Truth: How Spurious Correlations Undermine Hallucination Detection in LLMs
by: Wang, Shaowen, et al.
Published: (2025)
by: Wang, Shaowen, et al.
Published: (2025)
Large Language Model Prompt Datasets: An In-depth Analysis and Insights
by: Zhang, Yuanming, et al.
Published: (2025)
by: Zhang, Yuanming, et al.
Published: (2025)
BroRL: Scaling Reinforcement Learning via Broadened Exploration
by: Hu, Jian, et al.
Published: (2025)
by: Hu, Jian, et al.
Published: (2025)
Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling
by: Hao, Bingguang, et al.
Published: (2025)
by: Hao, Bingguang, et al.
Published: (2025)
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
by: Ma, Ruotian, et al.
Published: (2025)
by: Ma, Ruotian, et al.
Published: (2025)
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
by: Leng, Yongqi, et al.
Published: (2024)
by: Leng, Yongqi, et al.
Published: (2024)
Reinforcement Learning for Tool-Integrated Interleaved Thinking towards Cross-Domain Generalization
by: Chen, Zhengyu, et al.
Published: (2025)
by: Chen, Zhengyu, et al.
Published: (2025)
Enhancing LLM Knowledge Learning through Generalization
by: Zhu, Mingkang, et al.
Published: (2025)
by: Zhu, Mingkang, et al.
Published: (2025)
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs
by: Li, Shangzhan, et al.
Published: (2025)
by: Li, Shangzhan, et al.
Published: (2025)
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
by: Tang, Xinyu, et al.
Published: (2025)
by: Tang, Xinyu, et al.
Published: (2025)
Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents
by: Zhu, Mingkang, et al.
Published: (2025)
by: Zhu, Mingkang, et al.
Published: (2025)
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
by: Zheng, Chujie, et al.
Published: (2025)
by: Zheng, Chujie, et al.
Published: (2025)
Exploration Hacking: Can LLMs Learn to Resist RL Training?
by: Jang, Eyon, et al.
Published: (2026)
by: Jang, Eyon, et al.
Published: (2026)
Self-Hinting Language Models Enhance Reinforcement Learning
by: Liao, Baohao, et al.
Published: (2026)
by: Liao, Baohao, et al.
Published: (2026)
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
by: Hu, Jingcheng, et al.
Published: (2025)
by: Hu, Jingcheng, et al.
Published: (2025)
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
by: Wu, Jinyang, et al.
Published: (2025)
by: Wu, Jinyang, et al.
Published: (2025)
ConfClip: Confidence-Weighted and Clipped Reward for Reinforcement Learning in LLMs
by: Zhang, Bonan, et al.
Published: (2025)
by: Zhang, Bonan, et al.
Published: (2025)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
by: Wan, Ziyu, et al.
Published: (2025)
by: Wan, Ziyu, et al.
Published: (2025)
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
by: Wei, Zhepei, et al.
Published: (2025)
by: Wei, Zhepei, et al.
Published: (2025)
Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
by: Huang, Yiming, et al.
Published: (2026)
by: Huang, Yiming, et al.
Published: (2026)
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
by: Huang, Guanhua, et al.
Published: (2025)
by: Huang, Guanhua, et al.
Published: (2025)
SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models
by: Huo, Yifu, et al.
Published: (2026)
by: Huo, Yifu, et al.
Published: (2026)
Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning
by: Ding, Fei, et al.
Published: (2026)
by: Ding, Fei, et al.
Published: (2026)
Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
by: Zhang, Fuxiang, et al.
Published: (2024)
by: Zhang, Fuxiang, et al.
Published: (2024)
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach
by: Zhang, Xinnan, et al.
Published: (2025)
by: Zhang, Xinnan, et al.
Published: (2025)
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)
by: Dai, Runpeng, et al.
Published: (2025)
Natural Language Reinforcement Learning
by: Feng, Xidong, et al.
Published: (2024)
by: Feng, Xidong, et al.
Published: (2024)
DISPO: Enhancing Training Efficiency and Stability in Reinforcement Learning for Large Language Model Mathematical Reasoning
by: Karaman, Batuhan K., et al.
Published: (2026)
by: Karaman, Batuhan K., et al.
Published: (2026)
ToolExpander: Extending the Frontiers of Tool-Using Reinforcement Learning to Weak LLMs
by: Chen, Fu, et al.
Published: (2025)
by: Chen, Fu, et al.
Published: (2025)
Efficient Exploration for LLMs
by: Dwaracherla, Vikranth, et al.
Published: (2024)
by: Dwaracherla, Vikranth, et al.
Published: (2024)
Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling
by: Guo, Yiran, et al.
Published: (2026)
by: Guo, Yiran, et al.
Published: (2026)
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities
by: Li, Pengyi, et al.
Published: (2026)
by: Li, Pengyi, et al.
Published: (2026)
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning
by: Li, Yiyuan, et al.
Published: (2026)
by: Li, Yiyuan, et al.
Published: (2026)
Similar Items
-
Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning
by: Qin, Zhanyue, et al.
Published: (2026) -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
by: Huang, Wei, et al.
Published: (2025) -
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
by: Gu, Tianteng, et al.
Published: (2025) -
Reinforcement Learning Enhanced LLMs: A Survey
by: Wang, Shuhe, et al.
Published: (2024) -
Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
by: Deng, Wenhao, et al.
Published: (2025)