Saved in:
| Main Authors: | Zheng, Yilun, Ma, Dongyang, Liang, Tian, Xu, Jiahao, Huang, Xinting, Chen, Lihui, Mi, Haitao, Wang, Yan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08030 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The End of Manual Decoding: Towards Truly End-to-End Language Models
by: Wang, Zhichao, et al.
Published: (2025)
by: Wang, Zhichao, et al.
Published: (2025)
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
by: Liang, Tian, et al.
Published: (2025)
by: Liang, Tian, et al.
Published: (2025)
Less is More: Denoising Knowledge Graphs For Retrieval Augmented Generation
by: Zheng, Yilun, et al.
Published: (2025)
by: Zheng, Yilun, et al.
Published: (2025)
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
by: Li, Chunyang, et al.
Published: (2025)
by: Li, Chunyang, et al.
Published: (2025)
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
by: Zhang, Ziyin, et al.
Published: (2025)
by: Zhang, Ziyin, et al.
Published: (2025)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)
by: Liu, Xiaoyuan, et al.
Published: (2025)
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal
by: Huang, Jianheng, et al.
Published: (2024)
by: Huang, Jianheng, et al.
Published: (2024)
Graph-O1 : Monte Carlo Tree Search with Reinforcement Learning for Text-Attributed Graph Reasoning
by: Liu, Lihui
Published: (2025)
by: Liu, Lihui
Published: (2025)
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)
by: Panaganti, Kishan, et al.
Published: (2026)
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context
by: Liu, Xiaoyuan, et al.
Published: (2026)
by: Liu, Xiaoyuan, et al.
Published: (2026)
Block-Attention for Efficient Prefilling
by: Ma, Dongyang, et al.
Published: (2024)
by: Ma, Dongyang, et al.
Published: (2024)
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
by: He, Zhiwei, et al.
Published: (2025)
by: He, Zhiwei, et al.
Published: (2025)
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
by: Huang, Xinting, et al.
Published: (2025)
by: Huang, Xinting, et al.
Published: (2025)
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
by: Li, Mukai, et al.
Published: (2025)
by: Li, Mukai, et al.
Published: (2025)
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
by: Yue, Murong, et al.
Published: (2024)
by: Yue, Murong, et al.
Published: (2024)
Self-Consistency Boosts Calibration for Math Reasoning
by: Wang, Ante, et al.
Published: (2024)
by: Wang, Ante, et al.
Published: (2024)
Offline Learning and Forgetting for Reasoning with Large Language Models
by: Ni, Tianwei, et al.
Published: (2025)
by: Ni, Tianwei, et al.
Published: (2025)
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
by: Wang, Ante, et al.
Published: (2024)
by: Wang, Ante, et al.
Published: (2024)
Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Routing-Free Mixture-of-Experts
by: Liu, Yilun, et al.
Published: (2026)
by: Liu, Yilun, et al.
Published: (2026)
FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis
by: Zheng, Yilun, et al.
Published: (2025)
by: Zheng, Yilun, et al.
Published: (2025)
LLM Unlearning via Loss Adjustment with Only Forget Data
by: Wang, Yaxuan, et al.
Published: (2024)
by: Wang, Yaxuan, et al.
Published: (2024)
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding
by: Liu, Tianqiao, et al.
Published: (2024)
by: Liu, Tianqiao, et al.
Published: (2024)
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration
by: Zhang, Qifan, et al.
Published: (2026)
by: Zhang, Qifan, et al.
Published: (2026)
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
by: Yao, Wenlin, et al.
Published: (2024)
by: Yao, Wenlin, et al.
Published: (2024)
PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection
by: Cheng, Siyuan, et al.
Published: (2026)
by: Cheng, Siyuan, et al.
Published: (2026)
Reasoning with Sampling: Your Base Model is Smarter Than You Think
by: Karan, Aayush, et al.
Published: (2025)
by: Karan, Aayush, et al.
Published: (2025)
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
by: Wang, Mengru, et al.
Published: (2025)
by: Wang, Mengru, et al.
Published: (2025)
Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away
by: Ghosal, Soumya Suvra, et al.
Published: (2026)
by: Ghosal, Soumya Suvra, et al.
Published: (2026)
Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models
by: Pirozelli, Paulo, et al.
Published: (2023)
by: Pirozelli, Paulo, et al.
Published: (2023)
Revisiting Catastrophic Forgetting in Large Language Model Tuning
by: Li, Hongyu, et al.
Published: (2024)
by: Li, Hongyu, et al.
Published: (2024)
Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations
by: Luo, Haozheng, et al.
Published: (2026)
by: Luo, Haozheng, et al.
Published: (2026)
SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning
by: Li, Xuchen, et al.
Published: (2025)
by: Li, Xuchen, et al.
Published: (2025)
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
by: Diao, Muxi, et al.
Published: (2026)
by: Diao, Muxi, et al.
Published: (2026)
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
by: Chen, Zui, et al.
Published: (2025)
by: Chen, Zui, et al.
Published: (2025)
Dual-Head Reasoning Distillation: Improving Classifier Accuracy with Train-Time-Only Reasoning
by: Xu, Jillian, et al.
Published: (2025)
by: Xu, Jillian, et al.
Published: (2025)
Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation
by: Zhang, Ziyin, et al.
Published: (2024)
by: Zhang, Ziyin, et al.
Published: (2024)
ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments
by: Wang, Yuquan, et al.
Published: (2025)
by: Wang, Yuquan, et al.
Published: (2025)
R-Zero: Self-Evolving Reasoning LLM from Zero Data
by: Huang, Chengsong, et al.
Published: (2025)
by: Huang, Chengsong, et al.
Published: (2025)
SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing
by: Liu, Hongjun, et al.
Published: (2025)
by: Liu, Hongjun, et al.
Published: (2025)
Similar Items
-
The End of Manual Decoding: Towards Truly End-to-End Language Models
by: Wang, Zhichao, et al.
Published: (2025) -
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
by: Liang, Tian, et al.
Published: (2025) -
Less is More: Denoising Knowledge Graphs For Retrieval Augmented Generation
by: Zheng, Yilun, et al.
Published: (2025) -
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
by: Li, Chunyang, et al.
Published: (2025) -
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
by: Zhang, Ziyin, et al.
Published: (2025)