MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Wang, Dustin, Zhu, Rui-Jie, Abreu, Steven, Shan, Yong, Kergan, Taylor, Pan, Yuqi, Chou, Yuhong, Li, Zheng, Zhang, Ge, Huang, Wenhao, Eshraghian, Jason
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2507.06457
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866918087072153600
author	Wang, Dustin Zhu, Rui-Jie Abreu, Steven Shan, Yong Kergan, Taylor Pan, Yuqi Chou, Yuhong Li, Zheng Zhang, Ge Huang, Wenhao Eshraghian, Jason
author_facet	Wang, Dustin Zhu, Rui-Jie Abreu, Steven Shan, Yong Kergan, Taylor Pan, Yuqi Chou, Yuhong Li, Zheng Zhang, Ge Huang, Wenhao Eshraghian, Jason
contents	Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall performance, leading to hybrid architectures that combine linear and full attention layers. Despite extensive hybrid architecture research, the choice of linear attention component has not been deeply explored. We systematically evaluate various linear attention models across generations - vector recurrences to advanced gating mechanisms - both standalone and hybridized. To enable this comprehensive analysis, we trained and open-sourced 72 models: 36 at 340M parameters (20B tokens) and 36 at 1.3B parameters (100B tokens), covering six linear attention variants across five hybridization ratios. Benchmarking on standard language modeling and recall tasks reveals that superior standalone linear models do not necessarily excel in hybrids. While language modeling remains stable across linear-to-full attention ratios, recall significantly improves with increased full attention layers, particularly below a 3:1 ratio. Our study highlights selective gating, hierarchical recurrence, and controlled forgetting as critical for effective hybrid models. We recommend architectures such as HGRN-2 or GatedDeltaNet with a linear-to-full ratio between 3:1 and 6:1 to achieve Transformer-level recall efficiently. Our models are open-sourced at https://huggingface.co/collections/m-a-p/hybrid-linear-attention-research-686c488a63d609d2f20e2b1e.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_06457
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	A Systematic Analysis of Hybrid Linear Attention Wang, Dustin Zhu, Rui-Jie Abreu, Steven Shan, Yong Kergan, Taylor Pan, Yuqi Chou, Yuhong Li, Zheng Zhang, Ge Huang, Wenhao Eshraghian, Jason Computation and Language Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall performance, leading to hybrid architectures that combine linear and full attention layers. Despite extensive hybrid architecture research, the choice of linear attention component has not been deeply explored. We systematically evaluate various linear attention models across generations - vector recurrences to advanced gating mechanisms - both standalone and hybridized. To enable this comprehensive analysis, we trained and open-sourced 72 models: 36 at 340M parameters (20B tokens) and 36 at 1.3B parameters (100B tokens), covering six linear attention variants across five hybridization ratios. Benchmarking on standard language modeling and recall tasks reveals that superior standalone linear models do not necessarily excel in hybrids. While language modeling remains stable across linear-to-full attention ratios, recall significantly improves with increased full attention layers, particularly below a 3:1 ratio. Our study highlights selective gating, hierarchical recurrence, and controlled forgetting as critical for effective hybrid models. We recommend architectures such as HGRN-2 or GatedDeltaNet with a linear-to-full ratio between 3:1 and 6:1 to achieve Transformer-level recall efficiently. Our models are open-sourced at https://huggingface.co/collections/m-a-p/hybrid-linear-attention-research-686c488a63d609d2f20e2b1e.
title	A Systematic Analysis of Hybrid Linear Attention
topic	Computation and Language
url	https://arxiv.org/abs/2507.06457

Documenti analoghi