:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Haoran, Li, Yafu, Hu, Xuyang, Liu, Dongrui, Wang, Zhilin, Li, Bo, Cheng, Yu
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2509.14760
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
by: Li, Yafu, et al.
Published: (2025)

Characterizing, Evaluating, and Optimizing Complex Reasoning
by: Zhang, Haoran, et al.
Published: (2026)

New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR
by: Wang, Zhilin, et al.
Published: (2026)

SEE: Continual Fine-tuning with Sequential Ensemble of Experts
by: Wang, Zhilin, et al.
Published: (2025)

Rethinking Entropy Regularization in Large Reasoning Models
by: Jiang, Yuxian, et al.
Published: (2025)

Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing
by: Wang, Zhilin, et al.
Published: (2025)

From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
by: Li, Yafu, et al.
Published: (2025)

ExGRPO: Learning to Reason from Experience
by: Zhan, Runzhe, et al.
Published: (2025)

Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning
by: Wang, Zhilin, et al.
Published: (2025)

Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
by: Chen, Guanxu, et al.
Published: (2025)

FaithRL: Learning to Reason Faithfully through Step-Level Faithfulness Maximization
by: Gui, Runquan, et al.
Published: (2026)

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
by: Fu, Tingchen, et al.
Published: (2025)

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
by: Li, Yafu, et al.
Published: (2026)

Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text
by: Li, Yafu, et al.
Published: (2024)

Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)

Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
by: Li, Yafu, et al.
Published: (2025)

Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning
by: Zhang, Yadong, et al.
Published: (2024)

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
by: Qu, Xiaoye, et al.
Published: (2025)

MAGE: Machine-generated Text Detection in the Wild
by: Li, Yafu, et al.
Published: (2023)

Multilingual Safety Alignment via Self-Distillation
by: Qin, Ruiyang, et al.
Published: (2026)

A Survey of Test-Time Compute: From Intuitive Inference to Deliberate Reasoning
by: Ji, Yixin, et al.
Published: (2025)

Multi-LLM Collaborative Search for Complex Problem Solving
by: Yang, Sen, et al.
Published: (2025)

Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
by: Zhang, Zhiwei, et al.
Published: (2025)

FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
by: Chen, Guizhen, et al.
Published: (2025)

Draft-OPD: On-Policy Distillation for Speculative Draft Models
by: Lei, Haodi, et al.
Published: (2026)

Potential and Challenges of Model Editing for Social Debiasing
by: Yan, Jianhao, et al.
Published: (2024)

Long-Chain Reasoning Distillation via Adaptive Prefix Alignment
by: Liu, Zhenghao, et al.
Published: (2026)

From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG
by: Wu, Wenhao, et al.
Published: (2026)

X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability
by: Lu, Xiaoya, et al.
Published: (2025)

Keys to Robust Edits: from Theoretical Insights to Practical Advances
by: Yan, Jianhao, et al.
Published: (2024)

ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging
by: Yang, Junyao, et al.
Published: (2026)

MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models
by: Wang, Xinming, et al.
Published: (2025)

Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
by: Yang, Ruixin, et al.
Published: (2024)

Multi-hop Reasoning via Early Knowledge Alignment
by: Wang, Yuxin, et al.
Published: (2025)

TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
by: Xu, Chen, et al.
Published: (2026)

Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning
by: Cheng, Qianjia, et al.
Published: (2026)

What Have We Achieved on Non-autoregressive Translation?
by: Li, Yafu, et al.
Published: (2024)

SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment
by: Huang, Yuqing, et al.
Published: (2025)

Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
by: Jiang, Songtao, et al.
Published: (2025)

Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
by: Tian, Xiaoyu, et al.
Published: (2025)