Enregistré dans:
| Auteurs principaux: | Tan, Zhiquan, Hong, Yinrong |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2605.17497 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners
par: Tan, Zhiquan, et autres
Publié: (2026)
par: Tan, Zhiquan, et autres
Publié: (2026)
A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models
par: Tan, Zhiquan, et autres
Publié: (2025)
par: Tan, Zhiquan, et autres
Publié: (2025)
Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
par: Hong, Yinrong, et autres
Publié: (2025)
par: Hong, Yinrong, et autres
Publié: (2025)
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
par: Zhao, Siyan, et autres
Publié: (2026)
par: Zhao, Siyan, et autres
Publié: (2026)
Matrix Information Theory for Self-Supervised Learning
par: Zhang, Yifan, et autres
Publié: (2023)
par: Zhang, Yifan, et autres
Publié: (2023)
ROSD: Reflective On-Policy Self-Distillation for Language Model Reasoning across Domains
par: Zhao, Ziqi, et autres
Publié: (2026)
par: Zhao, Ziqi, et autres
Publié: (2026)
Can I understand what I create? Self-Knowledge Evaluation of Large Language Models
par: Tan, Zhiquan, et autres
Publié: (2024)
par: Tan, Zhiquan, et autres
Publié: (2024)
The Information of Large Language Model Geometry
par: Tan, Zhiquan, et autres
Publié: (2024)
par: Tan, Zhiquan, et autres
Publié: (2024)
CRISP: Compressed Reasoning via Iterative Self-Policy Distillation
par: Sang, Hejian, et autres
Publié: (2026)
par: Sang, Hejian, et autres
Publié: (2026)
Information-Theoretic Perspectives on Optimizers
par: Tan, Zhiquan, et autres
Publié: (2025)
par: Tan, Zhiquan, et autres
Publié: (2025)
Unveiling the Dynamics of Information Interplay in Supervised Learning
par: Song, Kun, et autres
Publié: (2024)
par: Song, Kun, et autres
Publié: (2024)
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training
par: Song, Kun, et autres
Publié: (2024)
par: Song, Kun, et autres
Publié: (2024)
OISD: On-Policy Internal Self-Distillation of Language Models
par: Liu, Xinyu, et autres
Publié: (2026)
par: Liu, Xinyu, et autres
Publié: (2026)
Understanding Grokking Through A Robustness Viewpoint
par: Tan, Zhiquan, et autres
Publié: (2023)
par: Tan, Zhiquan, et autres
Publié: (2023)
Diversity-Aware Policy Optimization for Large Language Model Reasoning
par: Yao, Jian, et autres
Publié: (2025)
par: Yao, Jian, et autres
Publié: (2025)
Self-Supervised Dataset Distillation for Transfer Learning
par: Lee, Dong Bok, et autres
Publié: (2023)
par: Lee, Dong Bok, et autres
Publié: (2023)
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
par: Agarwal, Rishabh, et autres
Publié: (2023)
par: Agarwal, Rishabh, et autres
Publié: (2023)
When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning
par: Liu, Xiaogeng, et autres
Publié: (2026)
par: Liu, Xiaogeng, et autres
Publié: (2026)
OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
par: Yang, Yuxiao, et autres
Publié: (2026)
par: Yang, Yuxiao, et autres
Publié: (2026)
PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence
par: Xu, Yuanda, et autres
Publié: (2026)
par: Xu, Yuanda, et autres
Publié: (2026)
Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition
par: Nolan, Matthew, et autres
Publié: (2025)
par: Nolan, Matthew, et autres
Publié: (2025)
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
par: Li, Hong, et autres
Publié: (2024)
par: Li, Hong, et autres
Publié: (2024)
Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models
par: Wang, Hao, et autres
Publié: (2026)
par: Wang, Hao, et autres
Publié: (2026)
LEPO: Latent Reasoning Policy Optimization for Large Language Models
par: Zhou, Yuyan, et autres
Publié: (2026)
par: Zhou, Yuyan, et autres
Publié: (2026)
A Survey of On-Policy Distillation for Large Language Models
par: Song, Mingyang, et autres
Publié: (2026)
par: Song, Mingyang, et autres
Publié: (2026)
HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation
par: Ding, Ken
Publié: (2026)
par: Ding, Ken
Publié: (2026)
Osmosis Distillation: Model Hijacking with the Fewest Samples
par: Shi, Yuchen, et autres
Publié: (2026)
par: Shi, Yuchen, et autres
Publié: (2026)
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
par: Xu, Xin, et autres
Publié: (2025)
par: Xu, Xin, et autres
Publié: (2025)
Self-Distillation of Hidden Layers for Self-Supervised Representation Learning
par: Lowe, Scott C., et autres
Publié: (2026)
par: Lowe, Scott C., et autres
Publié: (2026)
Scaling Reasoning Efficiently via Relaxed On-Policy Distillation
par: Ko, Jongwoo, et autres
Publié: (2026)
par: Ko, Jongwoo, et autres
Publié: (2026)
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
par: Joshi, Siddharth, et autres
Publié: (2024)
par: Joshi, Siddharth, et autres
Publié: (2024)
Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models
par: Wei, Lai, et autres
Publié: (2024)
par: Wei, Lai, et autres
Publié: (2024)
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
par: Cazenavette, George, et autres
Publié: (2025)
par: Cazenavette, George, et autres
Publié: (2025)
Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
par: Yang, Xuewei, et autres
Publié: (2026)
par: Yang, Xuewei, et autres
Publié: (2026)
Leave No One Behind: Online Self-Supervised Self-Distillation for Sequential Recommendation
par: Wei, Shaowei, et autres
Publié: (2024)
par: Wei, Shaowei, et autres
Publié: (2024)
AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
par: Zhang, Songming, et autres
Publié: (2025)
par: Zhang, Songming, et autres
Publié: (2025)
Self-Supervised Quantization-Aware Knowledge Distillation
par: Zhao, Kaiqi, et autres
Publié: (2024)
par: Zhao, Kaiqi, et autres
Publié: (2024)
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance
par: Shen, Ao, et autres
Publié: (2024)
par: Shen, Ao, et autres
Publié: (2024)
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
par: Wu, Yecheng, et autres
Publié: (2026)
par: Wu, Yecheng, et autres
Publié: (2026)
Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation
par: Fu, Yu, et autres
Publié: (2026)
par: Fu, Yu, et autres
Publié: (2026)
Documents similaires
-
PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners
par: Tan, Zhiquan, et autres
Publié: (2026) -
A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models
par: Tan, Zhiquan, et autres
Publié: (2025) -
Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
par: Hong, Yinrong, et autres
Publié: (2025) -
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
par: Zhao, Siyan, et autres
Publié: (2026) -
Matrix Information Theory for Self-Supervised Learning
par: Zhang, Yifan, et autres
Publié: (2023)