Saved in:
| Main Authors: | Rang, Miao, Bi, Zhenni, Zhou, Hang, Han, Kai, Wang, Xuechun, Xiao, An, Chen, Xinghao, Wang, Yunhe, Chen, Hanting |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.05940 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation
by: Rang, Miao, et al.
Published: (2025)
by: Rang, Miao, et al.
Published: (2025)
An Empirical Study of Scaling Law for OCR
by: Rang, Miao, et al.
Published: (2023)
by: Rang, Miao, et al.
Published: (2023)
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
by: Rang, Miao, et al.
Published: (2025)
by: Rang, Miao, et al.
Published: (2025)
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
by: Bi, Zhenni, et al.
Published: (2024)
by: Bi, Zhenni, et al.
Published: (2024)
Nexus: Higher-Order Attention Mechanisms in Transformers
by: Chen, Hanting, et al.
Published: (2025)
by: Chen, Hanting, et al.
Published: (2025)
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
by: Nie, Ying, et al.
Published: (2025)
by: Nie, Ying, et al.
Published: (2025)
ROOT: Robust Orthogonalized Optimizer for Neural Network Training
by: He, Wei, et al.
Published: (2025)
by: He, Wei, et al.
Published: (2025)
Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
DiJiang: Efficient Large Language Models through Compact Kernelization
by: Chen, Hanting, et al.
Published: (2024)
by: Chen, Hanting, et al.
Published: (2024)
Deferred Commitment Decoding for Diffusion Language Models
by: Shu, Yingte, et al.
Published: (2026)
by: Shu, Yingte, et al.
Published: (2026)
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
by: Guo, Jialong, et al.
Published: (2024)
by: Guo, Jialong, et al.
Published: (2024)
Transferable text data distillation by trajectory matching
by: Yao, Rong, et al.
Published: (2025)
by: Yao, Rong, et al.
Published: (2025)
GhostRNN: Reducing State Redundancy in RNN with Cheap Operations
by: Zhou, Hang, et al.
Published: (2024)
by: Zhou, Hang, et al.
Published: (2024)
From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs
by: Tian, Yuchuan, et al.
Published: (2025)
by: Tian, Yuchuan, et al.
Published: (2025)
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
by: Chen, Hanting, et al.
Published: (2025)
by: Chen, Hanting, et al.
Published: (2025)
Multi-Granularity Semantic Revision for Large Language Model Distillation
by: Liu, Xiaoyu, et al.
Published: (2024)
by: Liu, Xiaoyu, et al.
Published: (2024)
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs
by: Chen, Hanting, et al.
Published: (2025)
by: Chen, Hanting, et al.
Published: (2025)
The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs
by: Li, Xin, et al.
Published: (2026)
by: Li, Xin, et al.
Published: (2026)
Multiscale Positive-Unlabeled Detection of AI-Generated Texts
by: Tian, Yuchuan, et al.
Published: (2023)
by: Tian, Yuchuan, et al.
Published: (2023)
Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection
by: He, Chaoqun, et al.
Published: (2026)
by: He, Chaoqun, et al.
Published: (2026)
Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention
by: Lin, Wenye, et al.
Published: (2026)
by: Lin, Wenye, et al.
Published: (2026)
Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance
by: Jiang, Yuxuan, et al.
Published: (2026)
by: Jiang, Yuxuan, et al.
Published: (2026)
Self-Policy Distillation via Capability-Selective Subspace Projection
by: Hao, Guangya, et al.
Published: (2026)
by: Hao, Guangya, et al.
Published: (2026)
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
by: Ni, Yunsheng, et al.
Published: (2024)
by: Ni, Yunsheng, et al.
Published: (2024)
An Empirical Study of World Model Quantization
by: Fu, Zhongqian, et al.
Published: (2026)
by: Fu, Zhongqian, et al.
Published: (2026)
C-MOP: Integrating Momentum and Boundary-Aware Clustering for Enhanced Prompt Evolution
by: Yan, Binwei, et al.
Published: (2026)
by: Yan, Binwei, et al.
Published: (2026)
MAIGO: Mitigating Lost-in-Conversation with History-Cleaned On-Policy Self-Distillation
by: Zheng, Haoyu, et al.
Published: (2026)
by: Zheng, Haoyu, et al.
Published: (2026)
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
by: Tang, Yehui, et al.
Published: (2025)
by: Tang, Yehui, et al.
Published: (2025)
Scaling Reasoning Efficiently via Relaxed On-Policy Distillation
by: Ko, Jongwoo, et al.
Published: (2026)
by: Ko, Jongwoo, et al.
Published: (2026)
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate
by: Wang, Jianze, et al.
Published: (2026)
by: Wang, Jianze, et al.
Published: (2026)
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
by: Guo, Jianyuan, et al.
Published: (2024)
by: Guo, Jianyuan, et al.
Published: (2024)
Hybrid Policy Distillation for LLMs
by: Zhu, Wenhong, et al.
Published: (2026)
by: Zhu, Wenhong, et al.
Published: (2026)
Trust Region On-Policy Distillation
by: Xing, Xingrun, et al.
Published: (2026)
by: Xing, Xingrun, et al.
Published: (2026)
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
by: Wang, Sudong, et al.
Published: (2026)
by: Wang, Sudong, et al.
Published: (2026)
Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models
by: Tu, Zhijun, et al.
Published: (2025)
by: Tu, Zhijun, et al.
Published: (2025)
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
by: Zhou, Yuhang, et al.
Published: (2026)
by: Zhou, Yuhang, et al.
Published: (2026)
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)
by: Yang, Wenkai, et al.
Published: (2026)
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
by: Xu, Xin, et al.
Published: (2025)
by: Xu, Xin, et al.
Published: (2025)
Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
by: Yang, Xuewei, et al.
Published: (2026)
by: Yang, Xuewei, et al.
Published: (2026)
Are Full Rollouts Necessary for On-Policy Distillation?
by: Zhang, Yaocheng, et al.
Published: (2026)
by: Zhang, Yaocheng, et al.
Published: (2026)
Similar Items
-
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation
by: Rang, Miao, et al.
Published: (2025) -
An Empirical Study of Scaling Law for OCR
by: Rang, Miao, et al.
Published: (2023) -
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
by: Rang, Miao, et al.
Published: (2025) -
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
by: Bi, Zhenni, et al.
Published: (2024) -
Nexus: Higher-Order Attention Mechanisms in Transformers
by: Chen, Hanting, et al.
Published: (2025)