:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Tan, Zhiquan, Hong, Yinrong
Format:	Preprint
Publié:	2026
Sujets:	Machine Learning
Accès en ligne:	https://arxiv.org/abs/2605.17497
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners
par: Tan, Zhiquan, et autres
Publié: (2026)

A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models
par: Tan, Zhiquan, et autres
Publié: (2025)

Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
par: Hong, Yinrong, et autres
Publié: (2025)

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
par: Zhao, Siyan, et autres
Publié: (2026)

Matrix Information Theory for Self-Supervised Learning
par: Zhang, Yifan, et autres
Publié: (2023)

ROSD: Reflective On-Policy Self-Distillation for Language Model Reasoning across Domains
par: Zhao, Ziqi, et autres
Publié: (2026)

Can I understand what I create? Self-Knowledge Evaluation of Large Language Models
par: Tan, Zhiquan, et autres
Publié: (2024)

The Information of Large Language Model Geometry
par: Tan, Zhiquan, et autres
Publié: (2024)

CRISP: Compressed Reasoning via Iterative Self-Policy Distillation
par: Sang, Hejian, et autres
Publié: (2026)

Information-Theoretic Perspectives on Optimizers
par: Tan, Zhiquan, et autres
Publié: (2025)

Unveiling the Dynamics of Information Interplay in Supervised Learning
par: Song, Kun, et autres
Publié: (2024)

Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training
par: Song, Kun, et autres
Publié: (2024)

OISD: On-Policy Internal Self-Distillation of Language Models
par: Liu, Xinyu, et autres
Publié: (2026)

Understanding Grokking Through A Robustness Viewpoint
par: Tan, Zhiquan, et autres
Publié: (2023)

Diversity-Aware Policy Optimization for Large Language Model Reasoning
par: Yao, Jian, et autres
Publié: (2025)

Self-Supervised Dataset Distillation for Transfer Learning
par: Lee, Dong Bok, et autres
Publié: (2023)

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
par: Agarwal, Rishabh, et autres
Publié: (2023)

When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning
par: Liu, Xiaogeng, et autres
Publié: (2026)

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
par: Yang, Yuxiao, et autres
Publié: (2026)

PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence
par: Xu, Yuanda, et autres
Publié: (2026)

Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition
par: Nolan, Matthew, et autres
Publié: (2025)

ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
par: Li, Hong, et autres
Publié: (2024)

Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models
par: Wang, Hao, et autres
Publié: (2026)

LEPO: Latent Reasoning Policy Optimization for Large Language Models
par: Zhou, Yuyan, et autres
Publié: (2026)

A Survey of On-Policy Distillation for Large Language Models
par: Song, Mingyang, et autres
Publié: (2026)

HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation
par: Ding, Ken
Publié: (2026)

Osmosis Distillation: Model Hijacking with the Fewest Samples
par: Shi, Yuchen, et autres
Publié: (2026)

Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
par: Xu, Xin, et autres
Publié: (2025)

Self-Distillation of Hidden Layers for Self-Supervised Representation Learning
par: Lowe, Scott C., et autres
Publié: (2026)

Scaling Reasoning Efficiently via Relaxed On-Policy Distillation
par: Ko, Jongwoo, et autres
Publié: (2026)

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
par: Joshi, Siddharth, et autres
Publié: (2024)

Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models
par: Wei, Lai, et autres
Publié: (2024)

Dataset Distillation for Pre-Trained Self-Supervised Vision Models
par: Cazenavette, George, et autres
Publié: (2025)

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
par: Yang, Xuewei, et autres
Publié: (2026)

Leave No One Behind: Online Self-Supervised Self-Distillation for Sequential Recommendation
par: Wei, Shaowei, et autres
Publié: (2024)

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
par: Zhang, Songming, et autres
Publié: (2025)

Self-Supervised Quantization-Aware Knowledge Distillation
par: Zhao, Kaiqi, et autres
Publié: (2024)

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance
par: Shen, Ao, et autres
Publié: (2024)

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
par: Wu, Yecheng, et autres
Publié: (2026)

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation
par: Fu, Yu, et autres
Publié: (2026)