Saved in:
| Main Authors: | Sarkar, Nilesh, Deka, Dawar Jyoti |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.04037 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure
by: Sarkar, Nilesh, et al.
Published: (2026)
by: Sarkar, Nilesh, et al.
Published: (2026)
Spectral Superposition: A Theory of Feature Geometry
by: Ivanov, Georgi, et al.
Published: (2026)
by: Ivanov, Georgi, et al.
Published: (2026)
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
by: Bhardwaj, Kartikeya, et al.
Published: (2024)
by: Bhardwaj, Kartikeya, et al.
Published: (2024)
The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis
by: Pham, Hoang, et al.
Published: (2025)
by: Pham, Hoang, et al.
Published: (2025)
Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation
by: Zhang, Hanrong, et al.
Published: (2025)
by: Zhang, Hanrong, et al.
Published: (2025)
Understanding Emergent Misalignment via Feature Superposition Geometry
by: Minegishi, Gouki, et al.
Published: (2026)
by: Minegishi, Gouki, et al.
Published: (2026)
Superposition in Graph Neural Networks
by: Pertl, Lukas, et al.
Published: (2025)
by: Pertl, Lukas, et al.
Published: (2025)
Sparsity and Superposition in Mixture of Experts
by: Chaudhari, Marmik, et al.
Published: (2025)
by: Chaudhari, Marmik, et al.
Published: (2025)
Mathematical Models of Computation in Superposition
by: Hänni, Kaarel, et al.
Published: (2024)
by: Hänni, Kaarel, et al.
Published: (2024)
Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation
by: Zhang, Xiaoxiong, et al.
Published: (2024)
by: Zhang, Xiaoxiong, et al.
Published: (2024)
Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation
by: Chen, Yilong, et al.
Published: (2026)
by: Chen, Yilong, et al.
Published: (2026)
On Implications of Scaling Laws on Feature Superposition
by: Katta, Pavan
Published: (2024)
by: Katta, Pavan
Published: (2024)
Model Merging via Multi-Teacher Knowledge Distillation
by: Dalili, Seyed Arshan, et al.
Published: (2025)
by: Dalili, Seyed Arshan, et al.
Published: (2025)
Virtual Width Networks
by: Seed, et al.
Published: (2025)
by: Seed, et al.
Published: (2025)
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
by: Zhou, Yongchao, et al.
Published: (2023)
by: Zhou, Yongchao, et al.
Published: (2023)
Improve Knowledge Distillation via Label Revision and Data Selection
by: Lan, Weichao, et al.
Published: (2024)
by: Lan, Weichao, et al.
Published: (2024)
Representer Theorems for Metric and Preference Learning: Geometric Insights and Algorithms
by: Morteza, Peyman
Published: (2023)
by: Morteza, Peyman
Published: (2023)
Enhancing Transformer with GNN Structural Knowledge via Distillation: A Novel Approach
by: Duan, Zhihua, et al.
Published: (2025)
by: Duan, Zhihua, et al.
Published: (2025)
Adaptive Width Neural Networks
by: Errica, Federico, et al.
Published: (2025)
by: Errica, Federico, et al.
Published: (2025)
Enhancing Graph Neural Networks with Limited Labeled Data by Actively Distilling Knowledge from Large Language Models
by: Li, Quan, et al.
Published: (2024)
by: Li, Quan, et al.
Published: (2024)
Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation
by: Pradhan, Siddhartha, et al.
Published: (2025)
by: Pradhan, Siddhartha, et al.
Published: (2025)
Post-Pruning Accuracy Recovery via Data-Free Knowledge Distillation
by: Tripurwar, Chinmay, et al.
Published: (2025)
by: Tripurwar, Chinmay, et al.
Published: (2025)
Emergence of Frontier Superposition: Möbius attractor and Cascade Supervision
by: Gu, Hongyu, et al.
Published: (2026)
by: Gu, Hongyu, et al.
Published: (2026)
Efficient Knowledge Distillation via Curriculum Extraction
by: Gupta, Shivam, et al.
Published: (2025)
by: Gupta, Shivam, et al.
Published: (2025)
On the Diminishing Returns of Width for Continual Learning
by: Guha, Etash, et al.
Published: (2024)
by: Guha, Etash, et al.
Published: (2024)
Graph Knowledge Distillation to Mixture of Experts
by: Rumiantsev, Pavel, et al.
Published: (2024)
by: Rumiantsev, Pavel, et al.
Published: (2024)
Dynamic Temperature Scheduler for Knowledge Distillation
by: Islam, Sibgat Ul, et al.
Published: (2025)
by: Islam, Sibgat Ul, et al.
Published: (2025)
Membership and Memorization in LLM Knowledge Distillation
by: Zhang, Ziqi, et al.
Published: (2025)
by: Zhang, Ziqi, et al.
Published: (2025)
Principled Curriculum Learning using Parameter Continuation Methods
by: Pathak, Harsh Nilesh, et al.
Published: (2025)
by: Pathak, Harsh Nilesh, et al.
Published: (2025)
Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation
by: Park, Seonghyeon, et al.
Published: (2026)
by: Park, Seonghyeon, et al.
Published: (2026)
GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation
by: Galichin, Andrey V., et al.
Published: (2024)
by: Galichin, Andrey V., et al.
Published: (2024)
Improving Group Fairness in Knowledge Distillation via Laplace Approximation of Early Exits
by: Fasth, Edvin, et al.
Published: (2025)
by: Fasth, Edvin, et al.
Published: (2025)
A Functional Perspective on Knowledge Distillation in Neural Networks
by: Mason-Williams, Israel, et al.
Published: (2025)
by: Mason-Williams, Israel, et al.
Published: (2025)
Cooperative Knowledge Distillation: A Learner Agnostic Approach
by: Livanos, Michael, et al.
Published: (2024)
by: Livanos, Michael, et al.
Published: (2024)
Geometric Kolmogorov-Arnold Superposition Theorem
by: Alesiani, Francesco, et al.
Published: (2025)
by: Alesiani, Francesco, et al.
Published: (2025)
RanDeS: Randomized Delta Superposition for Multi-Model Compression
by: Zhou, Hangyu, et al.
Published: (2025)
by: Zhou, Hangyu, et al.
Published: (2025)
Superposition Yields Robust Neural Scaling
by: Liu, Yizhou, et al.
Published: (2025)
by: Liu, Yizhou, et al.
Published: (2025)
Compact Language Models via Pruning and Knowledge Distillation
by: Muralidharan, Saurav, et al.
Published: (2024)
by: Muralidharan, Saurav, et al.
Published: (2024)
On the Infinite Width and Depth Limits of Predictive Coding Networks
by: Innocenti, Francesco, et al.
Published: (2026)
by: Innocenti, Francesco, et al.
Published: (2026)
Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2
by: Martra, Pere
Published: (2025)
by: Martra, Pere
Published: (2025)
Similar Items
-
Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure
by: Sarkar, Nilesh, et al.
Published: (2026) -
Spectral Superposition: A Theory of Feature Geometry
by: Ivanov, Georgi, et al.
Published: (2026) -
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
by: Bhardwaj, Kartikeya, et al.
Published: (2024) -
The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis
by: Pham, Hoang, et al.
Published: (2025) -
Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation
by: Zhang, Hanrong, et al.
Published: (2025)