Saved in:
| Main Authors: | Wan, Xu, Wang, Yansheng, Huang, Wenqi, Sun, Mingyang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.20722 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AdapThink: Adaptive Thinking Preferences for Reasoning Language Model
by: Wan, Xu, et al.
Published: (2025)
by: Wan, Xu, et al.
Published: (2025)
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
by: Hong, Haitao, et al.
Published: (2025)
by: Hong, Haitao, et al.
Published: (2025)
Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning
by: Wan, Qian, et al.
Published: (2026)
by: Wan, Qian, et al.
Published: (2026)
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
by: Zhang, Lei, et al.
Published: (2023)
by: Zhang, Lei, et al.
Published: (2023)
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
by: Zhao, Shiwan, et al.
Published: (2026)
by: Zhao, Shiwan, et al.
Published: (2026)
Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport
by: Sun, Mingyang, et al.
Published: (2025)
by: Sun, Mingyang, et al.
Published: (2025)
CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models
by: Lv, Yaojia, et al.
Published: (2024)
by: Lv, Yaojia, et al.
Published: (2024)
Exploringand Unleashing the Power of Large Language Models in CI/CD Configuration Translation
by: Wang, Chong, et al.
Published: (2025)
by: Wang, Chong, et al.
Published: (2025)
Exploring and Unleashing the Power of Large Language Models in Automated Code Translation
by: Yang, Zhen, et al.
Published: (2024)
by: Yang, Zhen, et al.
Published: (2024)
SAMG: Offline-to-Online Reinforcement Learning via State-Action-Conditional Offline Model Guidance
by: Zhang, Liyu, et al.
Published: (2024)
by: Zhang, Liyu, et al.
Published: (2024)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
by: Wan, Xu, et al.
Published: (2025)
by: Wan, Xu, et al.
Published: (2025)
EvidenceMap: Learning Evidence Analysis to Unleash the Power of Small Language Models for Biomedical Question Answering
by: Zong, Chang, et al.
Published: (2025)
by: Zong, Chang, et al.
Published: (2025)
Resource-Efficient Reinforcement for Reasoning Large Language Models via Dynamic One-Shot Policy Refinement
by: Zhang, Yunjian, et al.
Published: (2026)
by: Zhang, Yunjian, et al.
Published: (2026)
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models
by: Huang, Xiaoke, et al.
Published: (2025)
by: Huang, Xiaoke, et al.
Published: (2025)
SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning
by: Liu, Huanyu, et al.
Published: (2025)
by: Liu, Huanyu, et al.
Published: (2025)
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
by: Xiao, Yicheng, et al.
Published: (2025)
by: Xiao, Yicheng, et al.
Published: (2025)
Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)
by: Yan, Jianhao, et al.
Published: (2025)
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
by: Liao, Yi, et al.
Published: (2025)
by: Liao, Yi, et al.
Published: (2025)
Effective Reinforcement Learning for Reasoning in Language Models
by: Huang, Lianghuan, et al.
Published: (2025)
by: Huang, Lianghuan, et al.
Published: (2025)
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
by: Sun, Haoyuan, et al.
Published: (2025)
by: Sun, Haoyuan, et al.
Published: (2025)
SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning
by: Wan, Xu, et al.
Published: (2025)
by: Wan, Xu, et al.
Published: (2025)
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
by: Xie, Tian, et al.
Published: (2025)
by: Xie, Tian, et al.
Published: (2025)
MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning
by: Shi, Wenqi, et al.
Published: (2024)
by: Shi, Wenqi, et al.
Published: (2024)
Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
by: Heuillet, Maxime, et al.
Published: (2025)
by: Heuillet, Maxime, et al.
Published: (2025)
Premise Order Matters in Reasoning with Large Language Models
by: Chen, Xinyun, et al.
Published: (2024)
by: Chen, Xinyun, et al.
Published: (2024)
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
by: Xu, Fengli, et al.
Published: (2025)
by: Xu, Fengli, et al.
Published: (2025)
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
by: Tang, Jiakai, et al.
Published: (2025)
by: Tang, Jiakai, et al.
Published: (2025)
On Predictability of Reinforcement Learning Dynamics for Large Language Models
by: Cai, Yuchen, et al.
Published: (2025)
by: Cai, Yuchen, et al.
Published: (2025)
AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models
by: Yeh, Cheng-Kai, et al.
Published: (2025)
by: Yeh, Cheng-Kai, et al.
Published: (2025)
Training Large Language Models to Reason via EM Policy Gradient
by: Xu, Tianbing
Published: (2025)
by: Xu, Tianbing
Published: (2025)
Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router
by: Shao, Chenyang, et al.
Published: (2025)
by: Shao, Chenyang, et al.
Published: (2025)
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
by: Zhou, Pengfei, et al.
Published: (2025)
by: Zhou, Pengfei, et al.
Published: (2025)
Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism
by: Wang, Zhiwei, et al.
Published: (2024)
by: Wang, Zhiwei, et al.
Published: (2024)
Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models
by: Zhang, Che, et al.
Published: (2024)
by: Zhang, Che, et al.
Published: (2024)
Rationale-Grounded In-Context Learning for Time Series Reasoning with Multimodal Large Language Models
by: Liu, Qingxiang, et al.
Published: (2026)
by: Liu, Qingxiang, et al.
Published: (2026)
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
by: Zhou, Gengze, et al.
Published: (2024)
by: Zhou, Gengze, et al.
Published: (2024)
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
by: Zhang, Wenhao, et al.
Published: (2025)
by: Zhang, Wenhao, et al.
Published: (2025)
Similar Items
-
AdapThink: Adaptive Thinking Preferences for Reasoning Language Model
by: Wan, Xu, et al.
Published: (2025) -
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
by: Hong, Haitao, et al.
Published: (2025) -
Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning
by: Wan, Qian, et al.
Published: (2026) -
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
by: Zhang, Lei, et al.
Published: (2023) -
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
by: Zhao, Shiwan, et al.
Published: (2026)