Saved in:
| Main Authors: | Xie, Zhihui, Chen, Jie, Chen, Liyu, Mao, Weichao, Xu, Jingjing, Kong, Lingpeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.03492 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)
by: Wang, Xu, et al.
Published: (2026)
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models
by: Zhao, Xueliang, et al.
Published: (2025)
by: Zhao, Xueliang, et al.
Published: (2025)
$\mathbf{(N,K)}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
by: Zhang, Yufeng, et al.
Published: (2024)
by: Zhang, Yufeng, et al.
Published: (2024)
Scaling Reasoning without Attention
by: Zhao, Xueliang, et al.
Published: (2025)
by: Zhao, Xueliang, et al.
Published: (2025)
DeepCritic: Deliberate Critique with Large Language Models
by: Yang, Wenkai, et al.
Published: (2025)
by: Yang, Wenkai, et al.
Published: (2025)
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
by: Li, Junsong, et al.
Published: (2025)
by: Li, Junsong, et al.
Published: (2025)
Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models
by: Sim, Shamus, et al.
Published: (2024)
by: Sim, Shamus, et al.
Published: (2024)
Self-Generated Critiques Boost Reward Modeling for Language Models
by: Yu, Yue, et al.
Published: (2024)
by: Yu, Yue, et al.
Published: (2024)
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
by: Xu, Ran, et al.
Published: (2025)
by: Xu, Ran, et al.
Published: (2025)
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
by: Ye, Jiacheng, et al.
Published: (2024)
by: Ye, Jiacheng, et al.
Published: (2024)
Self-Evolving Critique Abilities in Large Language Models
by: Tang, Zhengyang, et al.
Published: (2025)
by: Tang, Zhengyang, et al.
Published: (2025)
Scalable Oversight for Superhuman AI via Recursive Self-Critiquing
by: Wen, Xueru, et al.
Published: (2025)
by: Wen, Xueru, et al.
Published: (2025)
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling
by: Guo, Yiran, et al.
Published: (2026)
by: Guo, Yiran, et al.
Published: (2026)
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
by: Tang, Zhengyang, et al.
Published: (2025)
by: Tang, Zhengyang, et al.
Published: (2025)
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
by: Cui, Ganqu, et al.
Published: (2025)
by: Cui, Ganqu, et al.
Published: (2025)
SubgoalXL: Subgoal-based Expert Learning for Theorem Proving
by: Zhao, Xueliang, et al.
Published: (2024)
by: Zhao, Xueliang, et al.
Published: (2024)
Scaling Multimodal Search and Recommendation with Small Language Models via Upside-Down Reinforcement Learning
by: Lin, Yu-Chen, et al.
Published: (2025)
by: Lin, Yu-Chen, et al.
Published: (2025)
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
by: Xie, Tianbao, et al.
Published: (2023)
by: Xie, Tianbao, et al.
Published: (2023)
Self-Hinting Language Models Enhance Reinforcement Learning
by: Liao, Baohao, et al.
Published: (2026)
by: Liao, Baohao, et al.
Published: (2026)
Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
by: Heuillet, Maxime, et al.
Published: (2025)
by: Heuillet, Maxime, et al.
Published: (2025)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
by: Huang, Jie, et al.
Published: (2023)
by: Huang, Jie, et al.
Published: (2023)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
SelfIE: Self-Interpretation of Large Language Model Embeddings
by: Chen, Haozhe, et al.
Published: (2024)
by: Chen, Haozhe, et al.
Published: (2024)
SambaLingo: Teaching Large Language Models New Languages
by: Csaki, Zoltan, et al.
Published: (2024)
by: Csaki, Zoltan, et al.
Published: (2024)
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
by: Zhang, Haozhen, et al.
Published: (2025)
by: Zhang, Haozhen, et al.
Published: (2025)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
by: Luo, Haipeng, et al.
Published: (2023)
by: Luo, Haipeng, et al.
Published: (2023)
AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models
by: Lin, Xuan, et al.
Published: (2025)
by: Lin, Xuan, et al.
Published: (2025)
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
by: Rosset, Corby, et al.
Published: (2024)
by: Rosset, Corby, et al.
Published: (2024)
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
by: Dong, Guanting, et al.
Published: (2025)
by: Dong, Guanting, et al.
Published: (2025)
Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective
by: Kong, Lingxiao, et al.
Published: (2025)
by: Kong, Lingxiao, et al.
Published: (2025)
Learn To be Efficient: Build Structured Sparsity in Large Language Models
by: Zheng, Haizhong, et al.
Published: (2024)
by: Zheng, Haizhong, et al.
Published: (2024)
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
by: Wang, Siwei, et al.
Published: (2025)
by: Wang, Siwei, et al.
Published: (2025)
Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
by: Pang, Jing-Cheng, et al.
Published: (2024)
by: Pang, Jing-Cheng, et al.
Published: (2024)
PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning
by: Lu, Yao, et al.
Published: (2026)
by: Lu, Yao, et al.
Published: (2026)
Imitating Language via Scalable Inverse Reinforcement Learning
by: Wulfmeier, Markus, et al.
Published: (2024)
by: Wulfmeier, Markus, et al.
Published: (2024)
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
by: Wang, Yiping, et al.
Published: (2025)
by: Wang, Yiping, et al.
Published: (2025)
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
by: Zhu, Xudong, et al.
Published: (2025)
by: Zhu, Xudong, et al.
Published: (2025)
Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine
by: Alsadat, Shayan Meshkat, et al.
Published: (2024)
by: Alsadat, Shayan Meshkat, et al.
Published: (2024)
Similar Items
-
DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026) -
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models
by: Zhao, Xueliang, et al.
Published: (2025) -
$\mathbf{(N,K)}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
by: Zhang, Yufeng, et al.
Published: (2024) -
Scaling Reasoning without Attention
by: Zhao, Xueliang, et al.
Published: (2025) -
DeepCritic: Deliberate Critique with Large Language Models
by: Yang, Wenkai, et al.
Published: (2025)