Guardado en:
| Autores principales: | Wang, Futing, Yan, Jianhao, Luo, Yun, Cui, Ganqu, Wang, Zhi, Qu, Xiaoye, Zhang, Yue, Cheng, Yu, Lin, Tao |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2602.11748 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Learning to Reason under Off-Policy Guidance
por: Yan, Jianhao, et al.
Publicado: (2025)
por: Yan, Jianhao, et al.
Publicado: (2025)
ELICIT: LLM Augmentation via External In-Context Capability
por: Wang, Futing, et al.
Publicado: (2024)
por: Wang, Futing, et al.
Publicado: (2024)
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
por: Yang, Wang, et al.
Publicado: (2025)
por: Yang, Wang, et al.
Publicado: (2025)
Keys to Robust Edits: from Theoretical Insights to Practical Advances
por: Yan, Jianhao, et al.
Publicado: (2024)
por: Yan, Jianhao, et al.
Publicado: (2024)
Diversity-Incentivized Exploration for Versatile Reasoning
por: Hu, Zican, et al.
Publicado: (2025)
por: Hu, Zican, et al.
Publicado: (2025)
Potential and Challenges of Model Editing for Social Debiasing
por: Yan, Jianhao, et al.
Publicado: (2024)
por: Yan, Jianhao, et al.
Publicado: (2024)
Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization
por: Chen, Hung-Hsuan
Publicado: (2026)
por: Chen, Hung-Hsuan
Publicado: (2026)
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
por: Hu, Zican, et al.
Publicado: (2025)
por: Hu, Zican, et al.
Publicado: (2025)
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
por: Su, Zhaochen, et al.
Publicado: (2025)
por: Su, Zhaochen, et al.
Publicado: (2025)
Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy
por: Gao, Shujian, et al.
Publicado: (2026)
por: Gao, Shujian, et al.
Publicado: (2026)
Spotlight on Token Perception for Multimodal Reinforcement Learning
por: Huang, Siyuan, et al.
Publicado: (2025)
por: Huang, Siyuan, et al.
Publicado: (2025)
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
por: He, Zefeng, et al.
Publicado: (2025)
por: He, Zefeng, et al.
Publicado: (2025)
Deeper Learning, Dialogic Learning, and Critical Thinking
Publicado: (2025)
Publicado: (2025)
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model
por: Ding, Bowen, et al.
Publicado: (2025)
por: Ding, Bowen, et al.
Publicado: (2025)
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
por: Zheng, Ziwei, et al.
Publicado: (2025)
por: Zheng, Ziwei, et al.
Publicado: (2025)
EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework
por: Wang, Chen, et al.
Publicado: (2025)
por: Wang, Chen, et al.
Publicado: (2025)
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
por: Whitehouse, Chenxi, et al.
Publicado: (2025)
por: Whitehouse, Chenxi, et al.
Publicado: (2025)
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
por: Shen, Chuming, et al.
Publicado: (2025)
por: Shen, Chuming, et al.
Publicado: (2025)
Draft-OPD: On-Policy Distillation for Speculative Draft Models
por: Lei, Haodi, et al.
Publicado: (2026)
por: Lei, Haodi, et al.
Publicado: (2026)
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
por: Yan, Jianhao, et al.
Publicado: (2025)
por: Yan, Jianhao, et al.
Publicado: (2025)
RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models
por: Yan, Jianhao, et al.
Publicado: (2024)
por: Yan, Jianhao, et al.
Publicado: (2024)
Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning
por: Cui, Zhenyu, et al.
Publicado: (2026)
por: Cui, Zhenyu, et al.
Publicado: (2026)
THREAD: Thinking Deeper with Recursive Spawning
por: Schroeder, Philip, et al.
Publicado: (2024)
por: Schroeder, Philip, et al.
Publicado: (2024)
AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving
por: Luo, Yuechen, et al.
Publicado: (2025)
por: Luo, Yuechen, et al.
Publicado: (2025)
Exploring the Robustness of In-Context Learning with Noisy Labels
por: Cheng, Chen, et al.
Publicado: (2024)
por: Cheng, Chen, et al.
Publicado: (2024)
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
por: Wang, Haozhe, et al.
Publicado: (2025)
por: Wang, Haozhe, et al.
Publicado: (2025)
SCI-Verifier: Scientific Verifier with Thinking
por: Zheng, Shenghe, et al.
Publicado: (2025)
por: Zheng, Shenghe, et al.
Publicado: (2025)
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
por: Liu, Shih-Yang, et al.
Publicado: (2025)
por: Liu, Shih-Yang, et al.
Publicado: (2025)
VideoSSR: Video Self-Supervised Reinforcement Learning
por: He, Zefeng, et al.
Publicado: (2025)
por: He, Zefeng, et al.
Publicado: (2025)
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
por: Wang, Jiaqi, et al.
Publicado: (2025)
por: Wang, Jiaqi, et al.
Publicado: (2025)
Deeper Insights into Learning Performance of Stochastic Configuration Networks
por: Yan, Xiufeng, et al.
Publicado: (2024)
por: Yan, Xiufeng, et al.
Publicado: (2024)
NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
por: Chen, Huayu, et al.
Publicado: (2025)
por: Chen, Huayu, et al.
Publicado: (2025)
Understanding In-Context Learning from Repetitions
por: Yan, Jianhao, et al.
Publicado: (2023)
por: Yan, Jianhao, et al.
Publicado: (2023)
Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning
por: Yin, Qingyu, et al.
Publicado: (2024)
por: Yin, Qingyu, et al.
Publicado: (2024)
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning
por: Cheng, Qianjia, et al.
Publicado: (2026)
por: Cheng, Qianjia, et al.
Publicado: (2026)
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
por: An, Sohyun, et al.
Publicado: (2025)
por: An, Sohyun, et al.
Publicado: (2025)
Evolving Deeper LLM Thinking
por: Lee, Kuang-Huei, et al.
Publicado: (2025)
por: Lee, Kuang-Huei, et al.
Publicado: (2025)
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
por: Chen, Shuang, et al.
Publicado: (2025)
por: Chen, Shuang, et al.
Publicado: (2025)
P1: Mastering Physics Olympiads with Reinforcement Learning
por: Chen, Jiacheng, et al.
Publicado: (2025)
por: Chen, Jiacheng, et al.
Publicado: (2025)
Reinforcement Learning for Adaptive Resource Scheduling in Complex System Environments
por: Li, Pochun, et al.
Publicado: (2024)
por: Li, Pochun, et al.
Publicado: (2024)
Ejemplares similares
-
Learning to Reason under Off-Policy Guidance
por: Yan, Jianhao, et al.
Publicado: (2025) -
ELICIT: LLM Augmentation via External In-Context Capability
por: Wang, Futing, et al.
Publicado: (2024) -
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
por: Yang, Wang, et al.
Publicado: (2025) -
Keys to Robust Edits: from Theoretical Insights to Practical Advances
por: Yan, Jianhao, et al.
Publicado: (2024) -
Diversity-Incentivized Exploration for Versatile Reasoning
por: Hu, Zican, et al.
Publicado: (2025)