:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Souibgui, Mohamed Ali, Fostier, Jan, Abadía-Heredia, Rodrigo, Denysenko, Bohdan, Marschke, Christian, Peric, Igor
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2604.22050
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination
by: Naim, Omar, et al.
Published: (2025)

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning
by: Zarch, Hossein Entezari, et al.
Published: (2025)

Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
by: Qiu, Quantong, et al.
Published: (2026)

High-Layer Attention Pruning with Rescaling
by: Liu, Songtao, et al.
Published: (2025)

Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse
by: Fu, Zizhuo, et al.
Published: (2026)

Paying Attention to Facts: Quantifying the Knowledge Capacity of Attention Layers
by: Wong, Liang Ze
Published: (2025)

Multi-Layer Attention is the Amplifier of Demonstration Effectiveness
by: Wang, Dingzirui, et al.
Published: (2025)

DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
by: Tan, Zhen, et al.
Published: (2024)

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
by: ElNokrashy, Muhammad, et al.
Published: (2022)

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
by: Zarch, Hossein Entezari, et al.
Published: (2025)

An Ensemble Classification Approach in A Multi-Layered Large Language Model Framework for Disease Prediction
by: Hamdi, Ali, et al.
Published: (2025)

When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models
by: Sanyal, Sunny, et al.
Published: (2024)

LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
by: Kapadia, Shashank, et al.
Published: (2026)

CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill
by: McDanel, Bradley, et al.
Published: (2026)

Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
by: Musat, Tiberiu
Published: (2024)

Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
by: Shi, Zhenmei, et al.
Published: (2024)

A Semantic-Aware Layer-Freezing Approach to Computation-Efficient Fine-Tuning of Language Models
by: Gu, Jian, et al.
Published: (2024)

DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
by: Yang, Ning, et al.
Published: (2025)

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
by: Achtibat, Reduan, et al.
Published: (2024)

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
by: Bozic, Vukasin, et al.
Published: (2023)

Dr.LLM: Dynamic Layer Routing in LLMs
by: Heakl, Ahmed, et al.
Published: (2025)

Not All Layers of LLMs Are Necessary During Inference
by: Fan, Siqi, et al.
Published: (2024)

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
by: Brandon, William, et al.
Published: (2024)

Wave-PDE Nets: Trainable Wave-Equation Layers as an Alternative to Attention
by: Vejendla, Harshil
Published: (2025)

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
by: Bai, Yushi, et al.
Published: (2026)

Efficient LLM Moderation with Multi-Layer Latent Prototypes
by: Chrabąszcz, Maciej, et al.
Published: (2025)

Calibration Across Layers: Understanding Calibration Evolution in LLMs
by: Joshi, Abhinav, et al.
Published: (2025)

Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs
by: Jiang, Jingzhou, et al.
Published: (2026)

Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
by: Yang, Zhipeng, et al.
Published: (2025)

A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
by: Goel, Raghavv, et al.
Published: (2026)

Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
by: Bombari, Simone, et al.
Published: (2024)

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
by: Filipek, Adam
Published: (2025)

Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation
by: Song, Zhuo-Yang, et al.
Published: (2025)

Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
by: Chen, Yan-Lun, et al.
Published: (2025)

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
by: Hu, Lanxiang, et al.
Published: (2024)

Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection
by: Elgabry, Menna, et al.
Published: (2025)

Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
by: Jelenić, Fran, et al.
Published: (2023)

Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation
by: Ehab, Mohamed, et al.
Published: (2026)

Iterative Layer-wise Distillation for Efficient Compression of Large Language Models
by: Kovalev, Grigory, et al.
Published: (2025)

Towards Building Efficient Sentence BERT Models using Layer Pruning
by: Shelke, Anushka, et al.
Published: (2024)