Saved in:
| Main Authors: | Qu, Xiaoye, Li, Yafu, Su, Zhao-Chen, Sun, Weigao, Yan, Jianhao, Liu, Dongrui, Cui, Ganqu, Liu, Daizong, Liang, Shuxian, He, Junxian, Li, Peng, Wei, Wei, Shao, Jing, Lu, Chaochao, Zhang, Yue, Hua, Xian-Sheng, Zhou, Bowen, Cheng, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.21614 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)
by: Yan, Jianhao, et al.
Published: (2025)
ExGRPO: Learning to Reason from Experience
by: Zhan, Runzhe, et al.
Published: (2025)
by: Zhan, Runzhe, et al.
Published: (2025)
Spotlight on Token Perception for Multimodal Reinforcement Learning
by: Huang, Siyuan, et al.
Published: (2025)
by: Huang, Siyuan, et al.
Published: (2025)
VideoSSR: Video Self-Supervised Reinforcement Learning
by: He, Zefeng, et al.
Published: (2025)
by: He, Zefeng, et al.
Published: (2025)
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
by: He, Zefeng, et al.
Published: (2025)
by: He, Zefeng, et al.
Published: (2025)
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
by: Fu, Tingchen, et al.
Published: (2025)
by: Fu, Tingchen, et al.
Published: (2025)
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
by: He, Zefeng, et al.
Published: (2025)
by: He, Zefeng, et al.
Published: (2025)
Diversity-Incentivized Exploration for Versatile Reasoning
by: Hu, Zican, et al.
Published: (2025)
by: Hu, Zican, et al.
Published: (2025)
Rethinking Entropy Regularization in Large Reasoning Models
by: Jiang, Yuxian, et al.
Published: (2025)
by: Jiang, Yuxian, et al.
Published: (2025)
FaithRL: Learning to Reason Faithfully through Step-Level Faithfulness Maximization
by: Gui, Runquan, et al.
Published: (2026)
by: Gui, Runquan, et al.
Published: (2026)
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
by: Liu, Daizong, et al.
Published: (2024)
by: Liu, Daizong, et al.
Published: (2024)
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
by: Huang, Siyuan, et al.
Published: (2026)
by: Huang, Siyuan, et al.
Published: (2026)
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
by: Chen, Shuang, et al.
Published: (2025)
by: Chen, Shuang, et al.
Published: (2025)
Characterizing, Evaluating, and Optimizing Complex Reasoning
by: Zhang, Haoran, et al.
Published: (2026)
by: Zhang, Haoran, et al.
Published: (2026)
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
by: Shen, Chuming, et al.
Published: (2025)
by: Shen, Chuming, et al.
Published: (2025)
Draft-OPD: On-Policy Distillation for Speculative Draft Models
by: Lei, Haodi, et al.
Published: (2026)
by: Lei, Haodi, et al.
Published: (2026)
New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR
by: Wang, Zhilin, et al.
Published: (2026)
by: Wang, Zhilin, et al.
Published: (2026)
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Deliberation
by: Zhang, Haoran, et al.
Published: (2025)
by: Zhang, Haoran, et al.
Published: (2025)
A Survey of Reinforcement Learning for Large Reasoning Models
by: Zhang, Kaiyan, et al.
Published: (2025)
by: Zhang, Kaiyan, et al.
Published: (2025)
SEE: Continual Fine-tuning with Sequential Ensemble of Experts
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
by: Chen, Guanxu, et al.
Published: (2025)
by: Chen, Guanxu, et al.
Published: (2025)
Diving into Self-Evolving Training for Multimodal Reasoning
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
by: Su, Zhaochen, et al.
Published: (2025)
by: Su, Zhaochen, et al.
Published: (2025)
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
by: Li, Yafu, et al.
Published: (2025)
by: Li, Yafu, et al.
Published: (2025)
Rethinking Video-Language Model from the Language Input Perspective
by: Fang, Xiang, et al.
Published: (2026)
by: Fang, Xiang, et al.
Published: (2026)
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
by: Ren, Qihan, et al.
Published: (2026)
by: Ren, Qihan, et al.
Published: (2026)
GEMS: Agent-Native Multimodal Generation with Memory and Skills
by: He, Zefeng, et al.
Published: (2026)
by: He, Zefeng, et al.
Published: (2026)
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
by: Wang, Futing, et al.
Published: (2026)
by: Wang, Futing, et al.
Published: (2026)
From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
by: Li, Yafu, et al.
Published: (2025)
by: Li, Yafu, et al.
Published: (2025)
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
by: Qu, Xiaoye, et al.
Published: (2024)
by: Qu, Xiaoye, et al.
Published: (2024)
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
by: Sun, Weigao, et al.
Published: (2025)
by: Sun, Weigao, et al.
Published: (2025)
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
by: Sun, Weigao, et al.
Published: (2025)
by: Sun, Weigao, et al.
Published: (2025)
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
by: Sun, Weigao, et al.
Published: (2025)
by: Sun, Weigao, et al.
Published: (2025)
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
by: Qian, Chen, et al.
Published: (2025)
by: Qian, Chen, et al.
Published: (2025)
ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging
by: Yang, Junyao, et al.
Published: (2026)
by: Yang, Junyao, et al.
Published: (2026)
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
by: Li, Yafu, et al.
Published: (2026)
by: Li, Yafu, et al.
Published: (2026)
Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
by: Liu, Daizong, et al.
Published: (2024)
by: Liu, Daizong, et al.
Published: (2024)
Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?
by: Su, Zhaochen, et al.
Published: (2024)
by: Su, Zhaochen, et al.
Published: (2024)
Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs
by: Li, Junxian, et al.
Published: (2025)
by: Li, Junxian, et al.
Published: (2025)
Similar Items
-
Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025) -
ExGRPO: Learning to Reason from Experience
by: Zhan, Runzhe, et al.
Published: (2025) -
Spotlight on Token Perception for Multimodal Reinforcement Learning
by: Huang, Siyuan, et al.
Published: (2025) -
VideoSSR: Video Self-Supervised Reinforcement Learning
by: He, Zefeng, et al.
Published: (2025) -
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
by: He, Zefeng, et al.
Published: (2025)