Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guo, Han, Yang, Songlin, Goel, Tarushii, Xing, Eric P., Dao, Tri, Kim, Yoon
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.04761
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908857597427712
author	Guo, Han Yang, Songlin Goel, Tarushii Xing, Eric P. Dao, Tri Kim, Yoon
author_facet	Guo, Han Yang, Songlin Goel, Tarushii Xing, Eric P. Dao, Tri Kim, Yoon
contents	The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space models enable linear-time, constant-memory sequence modeling and can moreover be trained efficiently through matmul-rich parallelization across sequence length. However, at their core these models are still RNNs, and thus their use of a fixed-size hidden state to model the context is a fundamental limitation. This paper develops log-linear attention, an attention mechanism that balances linear attention's efficiency and the expressiveness of softmax attention. Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states. We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length. Log-linear attention is a general framework and can be applied on top of existing linear attention variants. As case studies, we instantiate log-linear variants of two recent architectures -- Mamba-2 and Gated DeltaNet -- and find they perform well compared to their linear-time variants.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_04761
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Log-Linear Attention Guo, Han Yang, Songlin Goel, Tarushii Xing, Eric P. Dao, Tri Kim, Yoon Machine Learning The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space models enable linear-time, constant-memory sequence modeling and can moreover be trained efficiently through matmul-rich parallelization across sequence length. However, at their core these models are still RNNs, and thus their use of a fixed-size hidden state to model the context is a fundamental limitation. This paper develops log-linear attention, an attention mechanism that balances linear attention's efficiency and the expressiveness of softmax attention. Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states. We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length. Log-linear attention is a general framework and can be applied on top of existing linear attention variants. As case studies, we instantiate log-linear variants of two recent architectures -- Mamba-2 and Gated DeltaNet -- and find they perform well compared to their linear-time variants.
title	Log-Linear Attention
topic	Machine Learning
url	https://arxiv.org/abs/2506.04761

Similar Items