Saved in:
| Main Authors: | Wen, Qishuai, Huang, Zhiyuan, Meng, Xianghan, He, Wei, Li, Chun-Guang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01219 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few
by: Wen, Qishuai, et al.
Published: (2025)
by: Wen, Qishuai, et al.
Published: (2025)
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
by: Wen, Qishuai, et al.
Published: (2024)
by: Wen, Qishuai, et al.
Published: (2024)
Exploring a Principled Framework for Deep Subspace Clustering
by: Meng, Xianghan, et al.
Published: (2025)
by: Meng, Xianghan, et al.
Published: (2025)
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning
by: Zhang, Jintao, et al.
Published: (2026)
by: Zhang, Jintao, et al.
Published: (2026)
Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery
by: He, Wei, et al.
Published: (2026)
by: He, Wei, et al.
Published: (2026)
Jointly Learning Structured Representations and Stabilized Affinity for Human Motion Segmentation
by: Meng, Xianghan, et al.
Published: (2026)
by: Meng, Xianghan, et al.
Published: (2026)
Temporal Rate Reduction Clustering for Human Motion Segmentation
by: Meng, Xianghan, et al.
Published: (2025)
by: Meng, Xianghan, et al.
Published: (2025)
MoH: Multi-Head Attention as Mixture-of-Head Attention
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Bootstrapping Top-down Information for Self-modulating Slot Attention
by: Kim, Dongwon, et al.
Published: (2024)
by: Kim, Dongwon, et al.
Published: (2024)
Mixture of Distributions Matters: Dynamic Sparse Attention for Efficient Video Diffusion Transformers
by: Liu, Yuxi, et al.
Published: (2026)
by: Liu, Yuxi, et al.
Published: (2026)
Elastic Attention Cores for Scalable Vision Transformers
by: Song, Alan Z., et al.
Published: (2026)
by: Song, Alan Z., et al.
Published: (2026)
Learning Informative Attention Weights for Person Re-Identification
by: Wang, Yancheng, et al.
Published: (2025)
by: Wang, Yancheng, et al.
Published: (2025)
Clebsch-Gordan Transformer: Fast and Global Equivariant Attention
by: Howell, Owen Lewis, et al.
Published: (2025)
by: Howell, Owen Lewis, et al.
Published: (2025)
Reinforced Attention Learning
by: Li, Bangzheng, et al.
Published: (2026)
by: Li, Bangzheng, et al.
Published: (2026)
Memory Efficient Neural Processes via Constant Memory Attention Block
by: Feng, Leo, et al.
Published: (2023)
by: Feng, Leo, et al.
Published: (2023)
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
by: Shen, Li, et al.
Published: (2024)
by: Shen, Li, et al.
Published: (2024)
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
by: Tang, Anke, et al.
Published: (2024)
by: Tang, Anke, et al.
Published: (2024)
MonarchRT: Efficient Attention for Real-Time Video Generation
by: Agarwal, Krish, et al.
Published: (2026)
by: Agarwal, Krish, et al.
Published: (2026)
SageAttention2++: A More Efficient Implementation of SageAttention2
by: Zhang, Jintao, et al.
Published: (2025)
by: Zhang, Jintao, et al.
Published: (2025)
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate
by: Lee, Byung Hyun, et al.
Published: (2025)
by: Lee, Byung Hyun, et al.
Published: (2025)
Uncertainty-Guided Attention and Entropy-Weighted Loss for Precise Plant Seedling Segmentation
by: Ehab, Mohamed, et al.
Published: (2026)
by: Ehab, Mohamed, et al.
Published: (2026)
A Scalable Attention-Based Approach for Image-to-3D Texture Mapping
by: Rampini, Arianna, et al.
Published: (2025)
by: Rampini, Arianna, et al.
Published: (2025)
GD-FPS: Growth-Driven Feedforward Parameter Selection for Efficient Fine-Tuning
by: Yang, Kenneth, et al.
Published: (2025)
by: Yang, Kenneth, et al.
Published: (2025)
Attention in Geometry: Scalable Spatial Modeling via Adaptive Density Fields and FAISS-Accelerated Kernels
by: Fan, Zhaowen
Published: (2026)
by: Fan, Zhaowen
Published: (2026)
ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers
by: Hsu, Chih-Chung, et al.
Published: (2026)
by: Hsu, Chih-Chung, et al.
Published: (2026)
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
by: Guo, Yichen, et al.
Published: (2025)
by: Guo, Yichen, et al.
Published: (2025)
EDiT: Efficient Diffusion Transformers with Linear Compressed Attention
by: Becker, Philipp, et al.
Published: (2025)
by: Becker, Philipp, et al.
Published: (2025)
Attention Guided Alignment in Efficient Vision-Language Models
by: Mahajan, Shweta, et al.
Published: (2025)
by: Mahajan, Shweta, et al.
Published: (2025)
DataDAM: Efficient Dataset Distillation with Attention Matching
by: Sajedi, Ahmad, et al.
Published: (2023)
by: Sajedi, Ahmad, et al.
Published: (2023)
Synthesizer Based Efficient Self-Attention for Vision Tasks
by: Zhu, Guangyang, et al.
Published: (2022)
by: Zhu, Guangyang, et al.
Published: (2022)
FasterViT: Fast Vision Transformers with Hierarchical Attention
by: Hatamizadeh, Ali, et al.
Published: (2023)
by: Hatamizadeh, Ali, et al.
Published: (2023)
Shiva-DiT: Residual-Based Differentiable Top-$k$ Selection for Efficient Diffusion Transformers
by: Zhang, Jiaji, et al.
Published: (2026)
by: Zhang, Jiaji, et al.
Published: (2026)
SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing
by: Li, Sheng, et al.
Published: (2024)
by: Li, Sheng, et al.
Published: (2024)
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
by: Gabetni, Firas, et al.
Published: (2025)
by: Gabetni, Firas, et al.
Published: (2025)
ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference
by: Pathak, Surendra, et al.
Published: (2026)
by: Pathak, Surendra, et al.
Published: (2026)
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
by: Shan, Jiquan, et al.
Published: (2025)
by: Shan, Jiquan, et al.
Published: (2025)
Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability
by: Oikarinen, Tuomas, et al.
Published: (2025)
by: Oikarinen, Tuomas, et al.
Published: (2025)
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
by: Hassani, Ali, et al.
Published: (2024)
by: Hassani, Ali, et al.
Published: (2024)
Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification
by: Wang, Zitai, et al.
Published: (2024)
by: Wang, Zitai, et al.
Published: (2024)
Reflecting Topology Consistency and Abnormality via Learnable Attentions for Airway Labeling
by: Li, Chenyu, et al.
Published: (2024)
by: Li, Chenyu, et al.
Published: (2024)
Similar Items
-
Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few
by: Wen, Qishuai, et al.
Published: (2025) -
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
by: Wen, Qishuai, et al.
Published: (2024) -
Exploring a Principled Framework for Deep Subspace Clustering
by: Meng, Xianghan, et al.
Published: (2025) -
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning
by: Zhang, Jintao, et al.
Published: (2026) -
Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery
by: He, Wei, et al.
Published: (2026)