Saved in:
| Main Authors: | Chen, Wei-Rui, Kothapalli, Vignesh, Fatahibaarzi, Ata, Sang, Hejian, Tang, Shao, Song, Qingquan, Wang, Zhipeng, Abdul-Mageed, Muhammad |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.21002 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
To Think or Not to Think: The Hidden Cost of Meta-Training with Excessive CoT Examples
by: Kothapalli, Vignesh, et al.
Published: (2025)
by: Kothapalli, Vignesh, et al.
Published: (2025)
To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation
by: Waheed, Abdul, et al.
Published: (2024)
by: Waheed, Abdul, et al.
Published: (2024)
uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes
by: Waheed, Abdul, et al.
Published: (2024)
by: Waheed, Abdul, et al.
Published: (2024)
Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems
by: Behdin, Kayhan, et al.
Published: (2025)
by: Behdin, Kayhan, et al.
Published: (2025)
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
by: Wu, Minghao, et al.
Published: (2023)
by: Wu, Minghao, et al.
Published: (2023)
CRISP: Compressed Reasoning via Iterative Self-Policy Distillation
by: Sang, Hejian, et al.
Published: (2026)
by: Sang, Hejian, et al.
Published: (2026)
SODA: Semi On-Policy Black-Box Distillation for Large Language Models
by: Chen, Xiwen, et al.
Published: (2026)
by: Chen, Xiwen, et al.
Published: (2026)
PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence
by: Xu, Yuanda, et al.
Published: (2026)
by: Xu, Yuanda, et al.
Published: (2026)
Liger Kernel: Efficient Triton Kernels for LLM Training
by: Hsu, Pin-Lun, et al.
Published: (2024)
by: Hsu, Pin-Lun, et al.
Published: (2024)
Distilling Text Style Transfer With Self-Explanation From LLMs
by: Zhang, Chiyu, et al.
Published: (2024)
by: Zhang, Chiyu, et al.
Published: (2024)
Interplay of Machine Translation, Diacritics, and Diacritization
by: Chen, Wei-Rui, et al.
Published: (2024)
by: Chen, Wei-Rui, et al.
Published: (2024)
AfroScope: A Framework for Studying the Linguistic Landscape of Africa
by: Kwon, Sang Yun, et al.
Published: (2026)
by: Kwon, Sang Yun, et al.
Published: (2026)
TIP: Token Importance in On-Policy Distillation
by: Xu, Yuanda, et al.
Published: (2026)
by: Xu, Yuanda, et al.
Published: (2026)
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
by: Zhu, Siyu, et al.
Published: (2025)
by: Zhu, Siyu, et al.
Published: (2025)
CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
by: Kothapalli, Vignesh, et al.
Published: (2025)
by: Kothapalli, Vignesh, et al.
Published: (2025)
Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs
by: Mekki, Abdellah El, et al.
Published: (2024)
by: Mekki, Abdellah El, et al.
Published: (2024)
On Barriers to Archival Audio Processing
by: Sullivan, Peter, et al.
Published: (2025)
by: Sullivan, Peter, et al.
Published: (2025)
Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation
by: Waheed, Abdul, et al.
Published: (2025)
by: Waheed, Abdul, et al.
Published: (2025)
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability
by: Chen, Wei-Rui, et al.
Published: (2023)
by: Chen, Wei-Rui, et al.
Published: (2023)
Scaling Reasoning Efficiently via Relaxed On-Policy Distillation
by: Ko, Jongwoo, et al.
Published: (2026)
by: Ko, Jongwoo, et al.
Published: (2026)
Towards Zero-Shot Text-To-Speech for Arabic Dialects
by: Doan, Khai Duy, et al.
Published: (2024)
by: Doan, Khai Duy, et al.
Published: (2024)
Reasoning-preserved Efficient Distillation of Large Language Models via Activation-aware Initialization
by: He, Junlin, et al.
Published: (2026)
by: He, Junlin, et al.
Published: (2026)
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
by: Alwajih, Fakhraddin, et al.
Published: (2024)
by: Alwajih, Fakhraddin, et al.
Published: (2024)
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
by: Yan, Shaotian, et al.
Published: (2026)
by: Yan, Shaotian, et al.
Published: (2026)
LLM-Guided Knowledge Distillation for Temporal Knowledge Graph Reasoning
by: Xing, Wang, et al.
Published: (2026)
by: Xing, Wang, et al.
Published: (2026)
Toucan: Many-to-Many Translation for 150 African Language Pairs
by: Elmadany, AbdelRahim, et al.
Published: (2024)
by: Elmadany, AbdelRahim, et al.
Published: (2024)
Zero-Shot Context-Aware ASR for Diverse Arabic Varieties
by: Talafha, Bashar, et al.
Published: (2025)
by: Talafha, Bashar, et al.
Published: (2025)
Cheetah: Natural Language Generation for 517 African Languages
by: Adebara, Ife, et al.
Published: (2024)
by: Adebara, Ife, et al.
Published: (2024)
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)
by: Xu, Yuanda, et al.
Published: (2026)
Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures
by: Li, Pingzhi, et al.
Published: (2025)
by: Li, Pingzhi, et al.
Published: (2025)
Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer
by: Zhang, Xiang, et al.
Published: (2024)
by: Zhang, Xiang, et al.
Published: (2024)
Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
by: Lin, Xiaofeng, et al.
Published: (2026)
by: Lin, Xiaofeng, et al.
Published: (2026)
Gazelle: An Instruction Dataset for Arabic Writing Assistance
by: Magdy, Samar M., et al.
Published: (2024)
by: Magdy, Samar M., et al.
Published: (2024)
Knowledge Distillation for Temporal Knowledge Graph Reasoning with Large Language Models
by: Xing, Wang, et al.
Published: (2026)
by: Xing, Wang, et al.
Published: (2026)
RLKD: Distilling LLMs' Reasoning via Reinforcement Learning
by: Xu, Shicheng, et al.
Published: (2025)
by: Xu, Shicheng, et al.
Published: (2025)
EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
by: Naeem, Numaan, et al.
Published: (2025)
by: Naeem, Numaan, et al.
Published: (2025)
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
by: Wu, Tong, et al.
Published: (2025)
by: Wu, Tong, et al.
Published: (2025)
Effective Distillation of Table-based Reasoning Ability from LLMs
by: Yang, Bohao, et al.
Published: (2023)
by: Yang, Bohao, et al.
Published: (2023)
Arabic Automatic Story Generation with Large Language Models
by: El-Shangiti, Ahmed Oumar, et al.
Published: (2024)
by: El-Shangiti, Ahmed Oumar, et al.
Published: (2024)
Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation
by: Zhang, Lechen, et al.
Published: (2026)
by: Zhang, Lechen, et al.
Published: (2026)
Similar Items
-
To Think or Not to Think: The Hidden Cost of Meta-Training with Excessive CoT Examples
by: Kothapalli, Vignesh, et al.
Published: (2025) -
To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation
by: Waheed, Abdul, et al.
Published: (2024) -
uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes
by: Waheed, Abdul, et al.
Published: (2024) -
Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems
by: Behdin, Kayhan, et al.
Published: (2025) -
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
by: Wu, Minghao, et al.
Published: (2023)