Saved in:
| Main Authors: | Zhang, Haoran, Li, Yafu, Hu, Xuyang, Liu, Dongrui, Wang, Zhilin, Li, Bo, Cheng, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.14760 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
by: Li, Yafu, et al.
Published: (2025)
by: Li, Yafu, et al.
Published: (2025)
Characterizing, Evaluating, and Optimizing Complex Reasoning
by: Zhang, Haoran, et al.
Published: (2026)
by: Zhang, Haoran, et al.
Published: (2026)
New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR
by: Wang, Zhilin, et al.
Published: (2026)
by: Wang, Zhilin, et al.
Published: (2026)
SEE: Continual Fine-tuning with Sequential Ensemble of Experts
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Rethinking Entropy Regularization in Large Reasoning Models
by: Jiang, Yuxian, et al.
Published: (2025)
by: Jiang, Yuxian, et al.
Published: (2025)
Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
by: Li, Yafu, et al.
Published: (2025)
by: Li, Yafu, et al.
Published: (2025)
ExGRPO: Learning to Reason from Experience
by: Zhan, Runzhe, et al.
Published: (2025)
by: Zhan, Runzhe, et al.
Published: (2025)
Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
by: Chen, Guanxu, et al.
Published: (2025)
by: Chen, Guanxu, et al.
Published: (2025)
FaithRL: Learning to Reason Faithfully through Step-Level Faithfulness Maximization
by: Gui, Runquan, et al.
Published: (2026)
by: Gui, Runquan, et al.
Published: (2026)
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
by: Fu, Tingchen, et al.
Published: (2025)
by: Fu, Tingchen, et al.
Published: (2025)
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
by: Li, Yafu, et al.
Published: (2026)
by: Li, Yafu, et al.
Published: (2026)
Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text
by: Li, Yafu, et al.
Published: (2024)
by: Li, Yafu, et al.
Published: (2024)
Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)
by: Yan, Jianhao, et al.
Published: (2025)
Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
by: Li, Yafu, et al.
Published: (2025)
by: Li, Yafu, et al.
Published: (2025)
Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning
by: Zhang, Yadong, et al.
Published: (2024)
by: Zhang, Yadong, et al.
Published: (2024)
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
by: Qu, Xiaoye, et al.
Published: (2025)
by: Qu, Xiaoye, et al.
Published: (2025)
MAGE: Machine-generated Text Detection in the Wild
by: Li, Yafu, et al.
Published: (2023)
by: Li, Yafu, et al.
Published: (2023)
Multilingual Safety Alignment via Self-Distillation
by: Qin, Ruiyang, et al.
Published: (2026)
by: Qin, Ruiyang, et al.
Published: (2026)
A Survey of Test-Time Compute: From Intuitive Inference to Deliberate Reasoning
by: Ji, Yixin, et al.
Published: (2025)
by: Ji, Yixin, et al.
Published: (2025)
Multi-LLM Collaborative Search for Complex Problem Solving
by: Yang, Sen, et al.
Published: (2025)
by: Yang, Sen, et al.
Published: (2025)
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
by: Zhang, Zhiwei, et al.
Published: (2025)
by: Zhang, Zhiwei, et al.
Published: (2025)
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
by: Chen, Guizhen, et al.
Published: (2025)
by: Chen, Guizhen, et al.
Published: (2025)
Draft-OPD: On-Policy Distillation for Speculative Draft Models
by: Lei, Haodi, et al.
Published: (2026)
by: Lei, Haodi, et al.
Published: (2026)
Potential and Challenges of Model Editing for Social Debiasing
by: Yan, Jianhao, et al.
Published: (2024)
by: Yan, Jianhao, et al.
Published: (2024)
Long-Chain Reasoning Distillation via Adaptive Prefix Alignment
by: Liu, Zhenghao, et al.
Published: (2026)
by: Liu, Zhenghao, et al.
Published: (2026)
From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG
by: Wu, Wenhao, et al.
Published: (2026)
by: Wu, Wenhao, et al.
Published: (2026)
X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability
by: Lu, Xiaoya, et al.
Published: (2025)
by: Lu, Xiaoya, et al.
Published: (2025)
Keys to Robust Edits: from Theoretical Insights to Practical Advances
by: Yan, Jianhao, et al.
Published: (2024)
by: Yan, Jianhao, et al.
Published: (2024)
ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging
by: Yang, Junyao, et al.
Published: (2026)
by: Yang, Junyao, et al.
Published: (2026)
MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models
by: Wang, Xinming, et al.
Published: (2025)
by: Wang, Xinming, et al.
Published: (2025)
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
by: Yang, Ruixin, et al.
Published: (2024)
by: Yang, Ruixin, et al.
Published: (2024)
Multi-hop Reasoning via Early Knowledge Alignment
by: Wang, Yuxin, et al.
Published: (2025)
by: Wang, Yuxin, et al.
Published: (2025)
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
by: Xu, Chen, et al.
Published: (2026)
by: Xu, Chen, et al.
Published: (2026)
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning
by: Cheng, Qianjia, et al.
Published: (2026)
by: Cheng, Qianjia, et al.
Published: (2026)
What Have We Achieved on Non-autoregressive Translation?
by: Li, Yafu, et al.
Published: (2024)
by: Li, Yafu, et al.
Published: (2024)
SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment
by: Huang, Yuqing, et al.
Published: (2025)
by: Huang, Yuqing, et al.
Published: (2025)
Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
by: Tian, Xiaoyu, et al.
Published: (2025)
by: Tian, Xiaoyu, et al.
Published: (2025)
Similar Items
-
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
by: Li, Yafu, et al.
Published: (2025) -
Characterizing, Evaluating, and Optimizing Complex Reasoning
by: Zhang, Haoran, et al.
Published: (2026) -
New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR
by: Wang, Zhilin, et al.
Published: (2026) -
SEE: Continual Fine-tuning with Sequential Ensemble of Experts
by: Wang, Zhilin, et al.
Published: (2025) -
Rethinking Entropy Regularization in Large Reasoning Models
by: Jiang, Yuxian, et al.
Published: (2025)