Saved in:
| Main Author: | Aquino-Michaels, Keston |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.02227 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
by: Piękos, Piotr, et al.
Published: (2025)
by: Piękos, Piotr, et al.
Published: (2025)
Random Initialization of Gated Sparse Adapters
by: Retault, Vi, et al.
Published: (2025)
by: Retault, Vi, et al.
Published: (2025)
Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention
by: Jin, Zehao, et al.
Published: (2026)
by: Jin, Zehao, et al.
Published: (2026)
Why Softmax Attention Outperforms Linear Attention
by: Deng, Yichuan, et al.
Published: (2023)
by: Deng, Yichuan, et al.
Published: (2023)
Simulating Hard Attention Using Soft Attention
by: Yang, Andy, et al.
Published: (2024)
by: Yang, Andy, et al.
Published: (2024)
A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse
by: Ganju, Atul, et al.
Published: (2026)
by: Ganju, Atul, et al.
Published: (2026)
SEA: Sparse Linear Attention with Estimated Attention Mask
by: Lee, Heejun, et al.
Published: (2023)
by: Lee, Heejun, et al.
Published: (2023)
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
by: Nawrot, Piotr, et al.
Published: (2025)
by: Nawrot, Piotr, et al.
Published: (2025)
Hard-Attention Gates with Gradient Routing for Endoscopic Image Computing
by: Roffo, Giorgio, et al.
Published: (2024)
by: Roffo, Giorgio, et al.
Published: (2024)
Gated Linear Attention Transformers with Hardware-Efficient Training
by: Yang, Songlin, et al.
Published: (2023)
by: Yang, Songlin, et al.
Published: (2023)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)
by: Yuan, Jingyang, et al.
Published: (2025)
Block Sparse Flash Attention
by: Ohayon, Daniel, et al.
Published: (2025)
by: Ohayon, Daniel, et al.
Published: (2025)
How Sparse Attention Approximates Exact Attention? Your Attention is Naturally $n^C$-Sparse
by: Deng, Yichuan, et al.
Published: (2024)
by: Deng, Yichuan, et al.
Published: (2024)
PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention
by: Chen, Lida, et al.
Published: (2025)
by: Chen, Lida, et al.
Published: (2025)
In-context KV-Cache Eviction for LLMs via Attention-Gate
by: Zeng, Zihao, et al.
Published: (2024)
by: Zeng, Zihao, et al.
Published: (2024)
Unique Hard Attention: A Tale of Two Sides
by: Jerad, Selim, et al.
Published: (2025)
by: Jerad, Selim, et al.
Published: (2025)
AdaSplash: Adaptive Sparse Flash Attention
by: Gonçalves, Nuno, et al.
Published: (2025)
by: Gonçalves, Nuno, et al.
Published: (2025)
Scaling Linear Attention with Sparse State Expansion
by: Pan, Yuqi, et al.
Published: (2025)
by: Pan, Yuqi, et al.
Published: (2025)
Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention
by: Zeris, Athanasios
Published: (2026)
by: Zeris, Athanasios
Published: (2026)
STS: Efficient Sparse Attention with Speculative Token Sparsity
by: Xu, Ceyu, et al.
Published: (2026)
by: Xu, Ceyu, et al.
Published: (2026)
AdaSplash-2: Faster Differentiable Sparse Attention
by: Gonçalves, Nuno, et al.
Published: (2026)
by: Gonçalves, Nuno, et al.
Published: (2026)
Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)
by: Cao, Ziyi, et al.
Published: (2025)
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
by: He, Mutian, et al.
Published: (2025)
by: He, Mutian, et al.
Published: (2025)
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)
by: Huang, Yuxiang, et al.
Published: (2026)
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
by: De, Soham, et al.
Published: (2024)
by: De, Soham, et al.
Published: (2024)
Universe Routing: Why Self-Evolving Agents Need Epistemic Control
by: Wang, Zhaohui Geoffrey
Published: (2026)
by: Wang, Zhaohui Geoffrey
Published: (2026)
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
by: Filipek, Adam
Published: (2025)
by: Filipek, Adam
Published: (2025)
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
by: Wen, Kaiyue, et al.
Published: (2024)
by: Wen, Kaiyue, et al.
Published: (2024)
BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization
by: Hagos, Desta Haileselassie, et al.
Published: (2025)
by: Hagos, Desta Haileselassie, et al.
Published: (2025)
ProxyAttn: Guided Sparse Attention via Representative Heads
by: Wang, Yixuan, et al.
Published: (2025)
by: Wang, Yixuan, et al.
Published: (2025)
LoLA: Low-Rank Linear Attention With Sparse Caching
by: McDermott, Luke, et al.
Published: (2025)
by: McDermott, Luke, et al.
Published: (2025)
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
by: He, Zhengfu, et al.
Published: (2025)
by: He, Zhengfu, et al.
Published: (2025)
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
by: Fan, Zehao, et al.
Published: (2025)
by: Fan, Zehao, et al.
Published: (2025)
Forgetting Transformer: Softmax Attention with a Forget Gate
by: Lin, Zhixuan, et al.
Published: (2025)
by: Lin, Zhixuan, et al.
Published: (2025)
NOSA: Native and Offloadable Sparse Attention
by: Huang, Yuxiang, et al.
Published: (2025)
by: Huang, Yuxiang, et al.
Published: (2025)
HSR-Enhanced Sparse Attention Acceleration
by: Chen, Bo, et al.
Published: (2024)
by: Chen, Bo, et al.
Published: (2024)
SpecAttn: Speculating Sparse Attention
by: Shah, Harsh
Published: (2025)
by: Shah, Harsh
Published: (2025)
Trainable Dynamic Mask Sparse Attention
by: Shi, Jingze, et al.
Published: (2025)
by: Shi, Jingze, et al.
Published: (2025)
Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention
by: Zeris, Athanasios
Published: (2026)
by: Zeris, Athanasios
Published: (2026)
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails
by: Frank, Gregory N.
Published: (2026)
by: Frank, Gregory N.
Published: (2026)
Similar Items
-
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
by: Piękos, Piotr, et al.
Published: (2025) -
Random Initialization of Gated Sparse Adapters
by: Retault, Vi, et al.
Published: (2025) -
Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention
by: Jin, Zehao, et al.
Published: (2026) -
Why Softmax Attention Outperforms Linear Attention
by: Deng, Yichuan, et al.
Published: (2023) -
Simulating Hard Attention Using Soft Attention
by: Yang, Andy, et al.
Published: (2024)