MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Meng, Weikang, Luo, Yadan, Huo, Liangyu, Li, Yingjian, Wang, Yaowei, Li, Xin, Zhang, Zheng
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2506.21137
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911714782478336
author	Meng, Weikang Luo, Yadan Huo, Liangyu Li, Yingjian Wang, Yaowei Li, Xin Zhang, Zheng
author_facet	Meng, Weikang Luo, Yadan Huo, Liangyu Li, Yingjian Wang, Yaowei Li, Xin Zhang, Zheng
contents	Linear attention mitigates the quadratic complexity of softmax attention but suffers from a critical loss of expressiveness. We identify two primary causes: (1) The normalization operation cancels the query norm, which breaks the correlation between a query's norm and the spikiness (entropy) of the attention distribution as in softmax attention. (2) Standard techniques for enforcing non-negativity cause destructive information loss by nullifying valid inner-product interactions. To address these challenges, we introduce NaLaFormer, a novel linear attention mechanism built upon a norm$\times$direction (ND) decomposition of the query and key vectors. We leverage each component to solve a distinct problem: The query norm is injected into our kernel to create a query-norm-aware map that restores the attention distribution's spikiness. The direction vectors are processed by a geometric, cosine-based similarity metric that guarantees non-negativity while preserving the rich, fine-grained information of the inner product. We validate NaLaFormer through a comprehensive multi-modal evaluation, where it sets new state-of-the-art benchmarks for linear attention. Our model achieves up to a 7.5% accuracy gain on ImageNet-1K and a 4.7% mIoU improvement on ADE20K over comparable baselines. It demonstrates profound efficiency, reducing peak memory by a transformative 92.3% in token-intensive super-resolution tasks (70K+ tokens). NaLaFormer's versatility is further confirmed as it surpasses strong baselines like Mamba on common-sense reasoning and sets a new state-of-the-art on the Long Range Arena (LRA) benchmark. Code is available at https://github.com/ZacharyMeng/NaLaFormer .
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_21137
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Norm$\times$Direction: Restoring the Missing Query Norm in Vision Linear Attention Meng, Weikang Luo, Yadan Huo, Liangyu Li, Yingjian Wang, Yaowei Li, Xin Zhang, Zheng Machine Learning Linear attention mitigates the quadratic complexity of softmax attention but suffers from a critical loss of expressiveness. We identify two primary causes: (1) The normalization operation cancels the query norm, which breaks the correlation between a query's norm and the spikiness (entropy) of the attention distribution as in softmax attention. (2) Standard techniques for enforcing non-negativity cause destructive information loss by nullifying valid inner-product interactions. To address these challenges, we introduce NaLaFormer, a novel linear attention mechanism built upon a norm$\times$direction (ND) decomposition of the query and key vectors. We leverage each component to solve a distinct problem: The query norm is injected into our kernel to create a query-norm-aware map that restores the attention distribution's spikiness. The direction vectors are processed by a geometric, cosine-based similarity metric that guarantees non-negativity while preserving the rich, fine-grained information of the inner product. We validate NaLaFormer through a comprehensive multi-modal evaluation, where it sets new state-of-the-art benchmarks for linear attention. Our model achieves up to a 7.5% accuracy gain on ImageNet-1K and a 4.7% mIoU improvement on ADE20K over comparable baselines. It demonstrates profound efficiency, reducing peak memory by a transformative 92.3% in token-intensive super-resolution tasks (70K+ tokens). NaLaFormer's versatility is further confirmed as it surpasses strong baselines like Mamba on common-sense reasoning and sets a new state-of-the-art on the Long Range Arena (LRA) benchmark. Code is available at https://github.com/ZacharyMeng/NaLaFormer .
title	Norm$\times$Direction: Restoring the Missing Query Norm in Vision Linear Attention
topic	Machine Learning
url	https://arxiv.org/abs/2506.21137

Documenti analoghi