:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sarkar, Nilesh, Deka, Dawar Jyoti
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.04037
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure
by: Sarkar, Nilesh, et al.
Published: (2026)

Spectral Superposition: A Theory of Feature Geometry
by: Ivanov, Georgi, et al.
Published: (2026)

Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
by: Bhardwaj, Kartikeya, et al.
Published: (2024)

The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis
by: Pham, Hoang, et al.
Published: (2025)

Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation
by: Zhang, Hanrong, et al.
Published: (2025)

Understanding Emergent Misalignment via Feature Superposition Geometry
by: Minegishi, Gouki, et al.
Published: (2026)

Superposition in Graph Neural Networks
by: Pertl, Lukas, et al.
Published: (2025)

Sparsity and Superposition in Mixture of Experts
by: Chaudhari, Marmik, et al.
Published: (2025)

Mathematical Models of Computation in Superposition
by: Hänni, Kaarel, et al.
Published: (2024)

Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation
by: Zhang, Xiaoxiong, et al.
Published: (2024)

Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation
by: Chen, Yilong, et al.
Published: (2026)

On Implications of Scaling Laws on Feature Superposition
by: Katta, Pavan
Published: (2024)

Model Merging via Multi-Teacher Knowledge Distillation
by: Dalili, Seyed Arshan, et al.
Published: (2025)

Virtual Width Networks
by: Seed, et al.
Published: (2025)

DistillSpec: Improving Speculative Decoding via Knowledge Distillation
by: Zhou, Yongchao, et al.
Published: (2023)

Improve Knowledge Distillation via Label Revision and Data Selection
by: Lan, Weichao, et al.
Published: (2024)

Representer Theorems for Metric and Preference Learning: Geometric Insights and Algorithms
by: Morteza, Peyman
Published: (2023)

Enhancing Transformer with GNN Structural Knowledge via Distillation: A Novel Approach
by: Duan, Zhihua, et al.
Published: (2025)

Adaptive Width Neural Networks
by: Errica, Federico, et al.
Published: (2025)

Enhancing Graph Neural Networks with Limited Labeled Data by Actively Distilling Knowledge from Large Language Models
by: Li, Quan, et al.
Published: (2024)

Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation
by: Pradhan, Siddhartha, et al.
Published: (2025)

Post-Pruning Accuracy Recovery via Data-Free Knowledge Distillation
by: Tripurwar, Chinmay, et al.
Published: (2025)

Emergence of Frontier Superposition: Möbius attractor and Cascade Supervision
by: Gu, Hongyu, et al.
Published: (2026)

Efficient Knowledge Distillation via Curriculum Extraction
by: Gupta, Shivam, et al.
Published: (2025)

On the Diminishing Returns of Width for Continual Learning
by: Guha, Etash, et al.
Published: (2024)

Graph Knowledge Distillation to Mixture of Experts
by: Rumiantsev, Pavel, et al.
Published: (2024)

Dynamic Temperature Scheduler for Knowledge Distillation
by: Islam, Sibgat Ul, et al.
Published: (2025)

Membership and Memorization in LLM Knowledge Distillation
by: Zhang, Ziqi, et al.
Published: (2025)

Principled Curriculum Learning using Parameter Continuation Methods
by: Pathak, Harsh Nilesh, et al.
Published: (2025)

Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation
by: Park, Seonghyeon, et al.
Published: (2026)

GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation
by: Galichin, Andrey V., et al.
Published: (2024)

Improving Group Fairness in Knowledge Distillation via Laplace Approximation of Early Exits
by: Fasth, Edvin, et al.
Published: (2025)

A Functional Perspective on Knowledge Distillation in Neural Networks
by: Mason-Williams, Israel, et al.
Published: (2025)

Cooperative Knowledge Distillation: A Learner Agnostic Approach
by: Livanos, Michael, et al.
Published: (2024)

Geometric Kolmogorov-Arnold Superposition Theorem
by: Alesiani, Francesco, et al.
Published: (2025)

RanDeS: Randomized Delta Superposition for Multi-Model Compression
by: Zhou, Hangyu, et al.
Published: (2025)

Superposition Yields Robust Neural Scaling
by: Liu, Yizhou, et al.
Published: (2025)

Compact Language Models via Pruning and Knowledge Distillation
by: Muralidharan, Saurav, et al.
Published: (2024)

On the Infinite Width and Depth Limits of Predictive Coding Networks
by: Innocenti, Francesco, et al.
Published: (2026)

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2
by: Martra, Pere
Published: (2025)