:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Aquino-Michaels, Keston
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2603.02227
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
by: Piękos, Piotr, et al.
Published: (2025)

Random Initialization of Gated Sparse Adapters
by: Retault, Vi, et al.
Published: (2025)

Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention
by: Jin, Zehao, et al.
Published: (2026)

Why Softmax Attention Outperforms Linear Attention
by: Deng, Yichuan, et al.
Published: (2023)

Simulating Hard Attention Using Soft Attention
by: Yang, Andy, et al.
Published: (2024)

A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse
by: Ganju, Atul, et al.
Published: (2026)

SEA: Sparse Linear Attention with Estimated Attention Mask
by: Lee, Heejun, et al.
Published: (2023)

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
by: Nawrot, Piotr, et al.
Published: (2025)

Hard-Attention Gates with Gradient Routing for Endoscopic Image Computing
by: Roffo, Giorgio, et al.
Published: (2024)

Gated Linear Attention Transformers with Hardware-Efficient Training
by: Yang, Songlin, et al.
Published: (2023)

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)

Block Sparse Flash Attention
by: Ohayon, Daniel, et al.
Published: (2025)

How Sparse Attention Approximates Exact Attention? Your Attention is Naturally $n^C$-Sparse
by: Deng, Yichuan, et al.
Published: (2024)

PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention
by: Chen, Lida, et al.
Published: (2025)

In-context KV-Cache Eviction for LLMs via Attention-Gate
by: Zeng, Zihao, et al.
Published: (2024)

Unique Hard Attention: A Tale of Two Sides
by: Jerad, Selim, et al.
Published: (2025)

AdaSplash: Adaptive Sparse Flash Attention
by: Gonçalves, Nuno, et al.
Published: (2025)

Scaling Linear Attention with Sparse State Expansion
by: Pan, Yuqi, et al.
Published: (2025)

Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention
by: Zeris, Athanasios
Published: (2026)

STS: Efficient Sparse Attention with Speculative Token Sparsity
by: Xu, Ceyu, et al.
Published: (2026)

AdaSplash-2: Faster Differentiable Sparse Attention
by: Gonçalves, Nuno, et al.
Published: (2026)

Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
by: He, Mutian, et al.
Published: (2025)

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
by: De, Soham, et al.
Published: (2024)

Universe Routing: Why Self-Evolving Agents Need Epistemic Control
by: Wang, Zhaohui Geoffrey
Published: (2026)

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
by: Filipek, Adam
Published: (2025)

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
by: Wen, Kaiyue, et al.
Published: (2024)

BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization
by: Hagos, Desta Haileselassie, et al.
Published: (2025)

ProxyAttn: Guided Sparse Attention via Representative Heads
by: Wang, Yixuan, et al.
Published: (2025)

LoLA: Low-Rank Linear Attention With Sparse Caching
by: McDermott, Luke, et al.
Published: (2025)

Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
by: He, Zhengfu, et al.
Published: (2025)

Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
by: Fan, Zehao, et al.
Published: (2025)

Forgetting Transformer: Softmax Attention with a Forget Gate
by: Lin, Zhixuan, et al.
Published: (2025)

NOSA: Native and Offloadable Sparse Attention
by: Huang, Yuxiang, et al.
Published: (2025)

HSR-Enhanced Sparse Attention Acceleration
by: Chen, Bo, et al.
Published: (2024)

SpecAttn: Speculating Sparse Attention
by: Shah, Harsh
Published: (2025)

Trainable Dynamic Mask Sparse Attention
by: Shi, Jingze, et al.
Published: (2025)

Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention
by: Zeris, Athanasios
Published: (2026)

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails
by: Frank, Gregory N.
Published: (2026)