:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Huang, Yufeng
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.17334
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Superlinear Multi-Step Attention
by: Huang, Yufeng
Published: (2026)

A Provable Expressiveness Hierarchy in Hybrid Linear-Full Attention
by: Ye, Xiaowei, et al.
Published: (2026)

Efficient Attention: Attention with Linear Complexities
by: Shen, Zhuoran, et al.
Published: (2018)

Log-Linear Attention
by: Guo, Han, et al.
Published: (2025)

Why Softmax Attention Outperforms Linear Attention
by: Deng, Yichuan, et al.
Published: (2023)

Exact Linear Attention
by: Ou, Weinuo
Published: (2026)

Kaczmarz Linear Attention
by: Zou, Jiaxuan, et al.
Published: (2026)

SEA: Sparse Linear Attention with Estimated Attention Mask
by: Lee, Heejun, et al.
Published: (2023)

Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
by: Tang, Zhongpan
Published: (2025)

Efficiently Dispatching Flash Attention For Partially Filled Attention Masks
by: Sharma, Agniv, et al.
Published: (2024)

An Analysis of Linear Complexity Attention Substitutes with BEST-RQ
by: Whetten, Ryan, et al.
Published: (2024)

Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
by: Zuo, Yifei, et al.
Published: (2025)

Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)

Token Sample Complexity of Attention
by: Bohbot, Léa, et al.
Published: (2025)

Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
by: Nishikawa, Naoki, et al.
Published: (2025)

Cottention: Linear Transformers With Cosine Attention
by: Mongaras, Gabriel, et al.
Published: (2024)

Linear Attention Sequence Parallelism
by: Sun, Weigao, et al.
Published: (2024)

The Key to State Reduction in Linear Attention: A Rank-based Perspective
by: Nazari, Philipp, et al.
Published: (2026)

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
by: Zhang, Yufeng, et al.
Published: (2022)

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention
by: Huang, Yulong, et al.
Published: (2026)

Quantum Complex-Valued Self-Attention Model
by: Chen, Fu, et al.
Published: (2025)

Linear Attention for Joint Power Optimization and User-Centric Clustering in Cell-Free Networks
by: Chafaa, Irched, et al.
Published: (2025)

InAttention: Linear Context Scaling for Transformers
by: Eisner, Joseph
Published: (2024)

Linear Memory SE(2) Invariant Attention
by: Pronovost, Ethan, et al.
Published: (2025)

Training Dynamics of In-Context Learning in Linear Attention
by: Zhang, Yedi, et al.
Published: (2025)

PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention
by: Chen, Lida, et al.
Published: (2025)

Global Attention with Linear Complexity for Exascale Generative Data Assimilation in Earth System Prediction
by: Wang, Xiao, et al.
Published: (2026)

Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective
by: Boursier, Etienne, et al.
Published: (2025)

Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention
by: Jin, Zehao, et al.
Published: (2026)

Higher-order Linear Attention
by: Zhang, Yifan, et al.
Published: (2025)

Enhancing Linear Attention with Residual Learning
by: Lai, Xunhao, et al.
Published: (2025)

Attention-based clustering
by: Maulen-Soto, Rodrigo, et al.
Published: (2025)

Kimi Linear: An Expressive, Efficient Attention Architecture
by: Kimi Team, et al.
Published: (2025)

Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models
by: Deng, Difan, et al.
Published: (2026)

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
by: He, Mutian, et al.
Published: (2025)

Linear Predictability of Attention Heads in Large Language Models
by: Shaikh, Khalid, et al.
Published: (2026)

WildCat: Near-Linear Attention in Theory and Practice
by: Schröder, Tobias, et al.
Published: (2026)

Hybrid Focal and Full-Range Attention Based Graph Transformers
by: Zhu, Minhong, et al.
Published: (2023)

LUNA: Linear Universal Neural Attention with Generalization Guarantees
by: Shahbazi, Ashkan, et al.
Published: (2025)

Adaptive Memory Decay for Log-Linear Attention
by: Amin, Yaxita, et al.
Published: (2026)