Enregistré dans:
| Auteurs principaux: | Hao, Guangya, Shang, Yitong, Long, Yunbo, Zhao, Zhuokai, Liang, Hanxue |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2605.22675 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Self-Evolving Multi-Agent Systems via Decentralized Memory
par: Hao, Guangya, et autres
Publié: (2026)
par: Hao, Guangya, et autres
Publié: (2026)
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
par: Zhou, Yuhang, et autres
Publié: (2026)
par: Zhou, Yuhang, et autres
Publié: (2026)
EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation
par: Long, Yunbo, et autres
Publié: (2026)
par: Long, Yunbo, et autres
Publié: (2026)
Crosslingual On-Policy Self-Distillation for Multilingual Reasoning
par: Liu, Yihong, et autres
Publié: (2026)
par: Liu, Yihong, et autres
Publié: (2026)
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
par: Zhao, Siyan, et autres
Publié: (2026)
par: Zhao, Siyan, et autres
Publié: (2026)
Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning
par: Zhao, Zhengyang, et autres
Publié: (2026)
par: Zhao, Zhengyang, et autres
Publié: (2026)
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
par: Wan, Guangya, et autres
Publié: (2024)
par: Wan, Guangya, et autres
Publié: (2024)
Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing
par: Rang, Miao, et autres
Publié: (2026)
par: Rang, Miao, et autres
Publié: (2026)
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
par: Long, Yunbo, et autres
Publié: (2026)
par: Long, Yunbo, et autres
Publié: (2026)
HARP: Hallucination Detection via Reasoning Subspace Projection
par: Hu, Junjie, et autres
Publié: (2025)
par: Hu, Junjie, et autres
Publié: (2025)
Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning
par: Wan, Guangya, et autres
Publié: (2024)
par: Wan, Guangya, et autres
Publié: (2024)
Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
par: Zhao, Haozhe, et autres
Publié: (2024)
par: Zhao, Haozhe, et autres
Publié: (2024)
EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation
par: Long, Yunbo, et autres
Publié: (2025)
par: Long, Yunbo, et autres
Publié: (2025)
Model Unlearning via Sparse Autoencoder Subspace Guided Projections
par: Wang, Xu, et autres
Publié: (2025)
par: Wang, Xu, et autres
Publié: (2025)
Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation
par: Zhang, Ruiqi, et autres
Publié: (2026)
par: Zhang, Ruiqi, et autres
Publié: (2026)
Measuring Affinity between Attention-Head Weight Subspaces via the Projection Kernel
par: Yamagiwa, Hiroaki, et autres
Publié: (2026)
par: Yamagiwa, Hiroaki, et autres
Publié: (2026)
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
par: Akgül, Ömer Faruk, et autres
Publié: (2026)
par: Akgül, Ömer Faruk, et autres
Publié: (2026)
DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training
par: Tian, Xiaoyu, et autres
Publié: (2025)
par: Tian, Xiaoyu, et autres
Publié: (2025)
$\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models
par: Liao, Huanxuan, et autres
Publié: (2024)
par: Liao, Huanxuan, et autres
Publié: (2024)
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
par: Kim, Jeonghye, et autres
Publié: (2026)
par: Kim, Jeonghye, et autres
Publié: (2026)
Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
par: Yang, Xuewei, et autres
Publié: (2026)
par: Yang, Xuewei, et autres
Publié: (2026)
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition
par: Chen, Zhaorun, et autres
Publié: (2024)
par: Chen, Zhaorun, et autres
Publié: (2024)
Self-Correction Distillation for Structured Data Question Answering
par: Zhu, Yushan, et autres
Publié: (2025)
par: Zhu, Yushan, et autres
Publié: (2025)
Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models
par: Liu, Weize, et autres
Publié: (2023)
par: Liu, Weize, et autres
Publié: (2023)
Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation
par: Cheng, Zihao, et autres
Publié: (2026)
par: Cheng, Zihao, et autres
Publié: (2026)
Generating Logically Consistent Synthetic Supply Chain Data with LLM-Driven Knowledge Graph Reasoning
par: Long, Yunbo, et autres
Publié: (2026)
par: Long, Yunbo, et autres
Publié: (2026)
EmoDebt: Bayesian-Optimized Emotional Intelligence for Strategic Agent-to-Agent Debt Recovery
par: Long, Yunbo, et autres
Publié: (2025)
par: Long, Yunbo, et autres
Publié: (2025)
Project Aletheia: Verifier-Guided Distillation of Backtracking for Small Language Models
par: Dixit, Aradhya, et autres
Publié: (2026)
par: Dixit, Aradhya, et autres
Publié: (2026)
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning
par: Ma, Yufei, et autres
Publié: (2026)
par: Ma, Yufei, et autres
Publié: (2026)
MAIGO: Mitigating Lost-in-Conversation with History-Cleaned On-Policy Self-Distillation
par: Zheng, Haoyu, et autres
Publié: (2026)
par: Zheng, Haoyu, et autres
Publié: (2026)
ROSD: Reflective On-Policy Self-Distillation for Language Model Reasoning across Domains
par: Zhao, Ziqi, et autres
Publié: (2026)
par: Zhao, Ziqi, et autres
Publié: (2026)
X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
par: Cao, Di, et autres
Publié: (2026)
par: Cao, Di, et autres
Publié: (2026)
Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages
par: Zhang, Yuanchi, et autres
Publié: (2024)
par: Zhang, Yuanchi, et autres
Publié: (2024)
Are Full Rollouts Necessary for On-Policy Distillation?
par: Zhang, Yaocheng, et autres
Publié: (2026)
par: Zhang, Yaocheng, et autres
Publié: (2026)
OPSDL: On-Policy Self-Distillation for Long-Context Language Models
par: Zhang, Xinsen, et autres
Publié: (2026)
par: Zhang, Xinsen, et autres
Publié: (2026)
Evolving LLMs' Self-Refinement Capability via Synergistic Training-Inference Optimization
par: Zeng, Yongcheng, et autres
Publié: (2025)
par: Zeng, Yongcheng, et autres
Publié: (2025)
Can Compact Language Models Search Like Agents? Distillation-Guided Policy Optimization for Preserving Agentic RAG Capabilities
par: Kotoge, Rikuto, et autres
Publié: (2025)
par: Kotoge, Rikuto, et autres
Publié: (2025)
Generalizing Fair Top-$k$ Selection: An Integrative Approach
par: Cai, Guangya
Publié: (2026)
par: Cai, Guangya
Publié: (2026)
Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation
par: Li, Zichong, et autres
Publié: (2026)
par: Li, Zichong, et autres
Publié: (2026)
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
par: Fu, Yao, et autres
Publié: (2024)
par: Fu, Yao, et autres
Publié: (2024)
Documents similaires
-
Self-Evolving Multi-Agent Systems via Decentralized Memory
par: Hao, Guangya, et autres
Publié: (2026) -
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
par: Zhou, Yuhang, et autres
Publié: (2026) -
EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation
par: Long, Yunbo, et autres
Publié: (2026) -
Crosslingual On-Policy Self-Distillation for Multilingual Reasoning
par: Liu, Yihong, et autres
Publié: (2026) -
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
par: Zhao, Siyan, et autres
Publié: (2026)