Saved in:
| Main Authors: | Faye, Bilal, Mbaye, Abdoulaye, Azzag, Hanane, Lebbah, Mustapha |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.22583 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection
by: Faye, Bilal, et al.
Published: (2024)
by: Faye, Bilal, et al.
Published: (2024)
Prototype-Guided Diffusion: Visual Conditioning without External Memory
by: Faye, Bilal, et al.
Published: (2025)
by: Faye, Bilal, et al.
Published: (2025)
Supervised Batch Normalization
by: Faye, Bilal, et al.
Published: (2024)
by: Faye, Bilal, et al.
Published: (2024)
Value-Free Policy Optimization via Reward Partitioning
by: Faye, Bilal, et al.
Published: (2025)
by: Faye, Bilal, et al.
Published: (2025)
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
by: Faye, Bilal, et al.
Published: (2024)
by: Faye, Bilal, et al.
Published: (2024)
Unsupervised Adaptive Normalization
by: Faye, Bilal, et al.
Published: (2024)
by: Faye, Bilal, et al.
Published: (2024)
Lightweight Cross-Modal Representation Learning
by: Faye, Bilal, et al.
Published: (2024)
by: Faye, Bilal, et al.
Published: (2024)
Adaptative Context Normalization: A Boost for Deep Learning in Image Processing
by: Faye, Bilal, et al.
Published: (2024)
by: Faye, Bilal, et al.
Published: (2024)
Enhancing Neural Network Representations with Prior Knowledge-Based Normalization
by: Faye, Bilal, et al.
Published: (2024)
by: Faye, Bilal, et al.
Published: (2024)
Game Theory Meets Statistical Mechanics in Deep Learning Design
by: Bouchaffra, Djamel, et al.
Published: (2024)
by: Bouchaffra, Djamel, et al.
Published: (2024)
Context Normalization Layer with Applications
by: Faye, Bilal, et al.
Published: (2023)
by: Faye, Bilal, et al.
Published: (2023)
Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models
by: Devynck, Tom, et al.
Published: (2026)
by: Devynck, Tom, et al.
Published: (2026)
Coalition Free Energy and Adaptive Precision in Multi-Agent Cooperation
by: Bouchaffra, Djamel, et al.
Published: (2026)
by: Bouchaffra, Djamel, et al.
Published: (2026)
NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical Physics
by: Bouchaffra, Djamel, et al.
Published: (2026)
by: Bouchaffra, Djamel, et al.
Published: (2026)
Distributed MCMC inference for Bayesian Non-Parametric Latent Block Model
by: Khoufache, Reda, et al.
Published: (2024)
by: Khoufache, Reda, et al.
Published: (2024)
MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing
by: Radouane, Karim, et al.
Published: (2025)
by: Radouane, Karim, et al.
Published: (2025)
A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics
by: Bouchaffra, Djamel, et al.
Published: (2026)
by: Bouchaffra, Djamel, et al.
Published: (2026)
Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection
by: Amekoe, Kodjo Mawuena, et al.
Published: (2024)
by: Amekoe, Kodjo Mawuena, et al.
Published: (2024)
Manual Verbalizer Enrichment for Few-Shot Text Classification
by: Nguyen, Quang Anh, et al.
Published: (2024)
by: Nguyen, Quang Anh, et al.
Published: (2024)
MoH: Multi-Head Attention as Mixture-of-Head Attention
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Multi-Head Low-Rank Attention
by: Liu, Songtao, et al.
Published: (2026)
by: Liu, Songtao, et al.
Published: (2026)
Interleaved Head Attention
by: Duvvuri, Sai Surya, et al.
Published: (2026)
by: Duvvuri, Sai Surya, et al.
Published: (2026)
Memorization Capacity of Multi-Head Attention in Transformers
by: Mahdavi, Sadegh, et al.
Published: (2023)
by: Mahdavi, Sadegh, et al.
Published: (2023)
Efficient LLMs with AMP: Attention Heads and MLP Pruning
by: Mugnaini, Leandro Giusti, et al.
Published: (2025)
by: Mugnaini, Leandro Giusti, et al.
Published: (2025)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)
by: Li, Cheng, et al.
Published: (2025)
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
by: Chen, Yilong, et al.
Published: (2024)
by: Chen, Yilong, et al.
Published: (2024)
CHAI: Clustered Head Attention for Efficient LLM Inference
by: Agarwal, Saurabh, et al.
Published: (2024)
by: Agarwal, Saurabh, et al.
Published: (2024)
Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification
by: Donhauser, Konstantin, et al.
Published: (2025)
by: Donhauser, Konstantin, et al.
Published: (2025)
Boosting House Price Estimations with Multi-Head Gated Attention
by: Sellam, Zakaria Abdellah, et al.
Published: (2024)
by: Sellam, Zakaria Abdellah, et al.
Published: (2024)
Geometric Analysis of Token Selection in Multi-Head Attention
by: Mudarisov, Timur, et al.
Published: (2026)
by: Mudarisov, Timur, et al.
Published: (2026)
Improving Transformers with Dynamically Composable Multi-Head Attention
by: Xiao, Da, et al.
Published: (2024)
by: Xiao, Da, et al.
Published: (2024)
Superiority of Multi-Head Attention in In-Context Linear Regression
by: Cui, Yingqian, et al.
Published: (2024)
by: Cui, Yingqian, et al.
Published: (2024)
Efficient Image Generation with Variadic Attention Heads
by: Walton, Steven, et al.
Published: (2022)
by: Walton, Steven, et al.
Published: (2022)
Benign Overfitting in Single-Head Attention
by: Magen, Roey, et al.
Published: (2024)
by: Magen, Roey, et al.
Published: (2024)
A Capacity-Based Rationale for Multi-Head Attention
by: Adler, Micah
Published: (2025)
by: Adler, Micah
Published: (2025)
Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism
by: Cui, Chenwei, et al.
Published: (2026)
by: Cui, Chenwei, et al.
Published: (2026)
Beyond Parallelism: Synergistic Computational Graph Effects in Multi-Head Attention
by: Borde, Haitz Sáez de Ocáriz
Published: (2025)
by: Borde, Haitz Sáez de Ocáriz
Published: (2025)
Multi-Head Spectral-Adaptive Graph Anomaly Detection
by: Cao, Qingyue, et al.
Published: (2025)
by: Cao, Qingyue, et al.
Published: (2025)
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
by: Otsuka, Hikari, et al.
Published: (2025)
by: Otsuka, Hikari, et al.
Published: (2025)
Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
by: Musat, Tiberiu
Published: (2024)
by: Musat, Tiberiu
Published: (2024)
Similar Items
-
Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection
by: Faye, Bilal, et al.
Published: (2024) -
Prototype-Guided Diffusion: Visual Conditioning without External Memory
by: Faye, Bilal, et al.
Published: (2025) -
Supervised Batch Normalization
by: Faye, Bilal, et al.
Published: (2024) -
Value-Free Policy Optimization via Reward Partitioning
by: Faye, Bilal, et al.
Published: (2025) -
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
by: Faye, Bilal, et al.
Published: (2024)