:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Faye, Bilal, Mbaye, Abdoulaye, Azzag, Hanane, Lebbah, Mustapha
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2604.22583
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection
by: Faye, Bilal, et al.
Published: (2024)

Prototype-Guided Diffusion: Visual Conditioning without External Memory
by: Faye, Bilal, et al.
Published: (2025)

Supervised Batch Normalization
by: Faye, Bilal, et al.
Published: (2024)

Value-Free Policy Optimization via Reward Partitioning
by: Faye, Bilal, et al.
Published: (2025)

OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
by: Faye, Bilal, et al.
Published: (2024)

Unsupervised Adaptive Normalization
by: Faye, Bilal, et al.
Published: (2024)

Lightweight Cross-Modal Representation Learning
by: Faye, Bilal, et al.
Published: (2024)

Adaptative Context Normalization: A Boost for Deep Learning in Image Processing
by: Faye, Bilal, et al.
Published: (2024)

Enhancing Neural Network Representations with Prior Knowledge-Based Normalization
by: Faye, Bilal, et al.
Published: (2024)

Game Theory Meets Statistical Mechanics in Deep Learning Design
by: Bouchaffra, Djamel, et al.
Published: (2024)

Context Normalization Layer with Applications
by: Faye, Bilal, et al.
Published: (2023)

Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models
by: Devynck, Tom, et al.
Published: (2026)

Coalition Free Energy and Adaptive Precision in Multi-Agent Cooperation
by: Bouchaffra, Djamel, et al.
Published: (2026)

NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical Physics
by: Bouchaffra, Djamel, et al.
Published: (2026)

Distributed MCMC inference for Bayesian Non-Parametric Latent Block Model
by: Khoufache, Reda, et al.
Published: (2024)

MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing
by: Radouane, Karim, et al.
Published: (2025)

A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics
by: Bouchaffra, Djamel, et al.
Published: (2026)

Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection
by: Amekoe, Kodjo Mawuena, et al.
Published: (2024)

Manual Verbalizer Enrichment for Few-Shot Text Classification
by: Nguyen, Quang Anh, et al.
Published: (2024)

MoH: Multi-Head Attention as Mixture-of-Head Attention
by: Jin, Peng, et al.
Published: (2024)

Multi-Head Low-Rank Attention
by: Liu, Songtao, et al.
Published: (2026)

Interleaved Head Attention
by: Duvvuri, Sai Surya, et al.
Published: (2026)

Memorization Capacity of Multi-Head Attention in Transformers
by: Mahdavi, Sadegh, et al.
Published: (2023)

Efficient LLMs with AMP: Attention Heads and MLP Pruning
by: Mugnaini, Leandro Giusti, et al.
Published: (2025)

Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
by: Chen, Yilong, et al.
Published: (2024)

CHAI: Clustered Head Attention for Efficient LLM Inference
by: Agarwal, Saurabh, et al.
Published: (2024)

Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification
by: Donhauser, Konstantin, et al.
Published: (2025)

Boosting House Price Estimations with Multi-Head Gated Attention
by: Sellam, Zakaria Abdellah, et al.
Published: (2024)

Geometric Analysis of Token Selection in Multi-Head Attention
by: Mudarisov, Timur, et al.
Published: (2026)

Improving Transformers with Dynamically Composable Multi-Head Attention
by: Xiao, Da, et al.
Published: (2024)

Superiority of Multi-Head Attention in In-Context Linear Regression
by: Cui, Yingqian, et al.
Published: (2024)

Efficient Image Generation with Variadic Attention Heads
by: Walton, Steven, et al.
Published: (2022)

Benign Overfitting in Single-Head Attention
by: Magen, Roey, et al.
Published: (2024)

A Capacity-Based Rationale for Multi-Head Attention
by: Adler, Micah
Published: (2025)

Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism
by: Cui, Chenwei, et al.
Published: (2026)

Beyond Parallelism: Synergistic Computational Graph Effects in Multi-Head Attention
by: Borde, Haitz Sáez de Ocáriz
Published: (2025)

Multi-Head Spectral-Adaptive Graph Anomaly Detection
by: Cao, Qingyue, et al.
Published: (2025)

The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
by: Otsuka, Hikari, et al.
Published: (2025)

Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
by: Musat, Tiberiu
Published: (2024)