Saved in:
| Main Authors: | , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.09025 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917416493121536 |
|---|---|
| author | Van Nguyen, Chien Nguyen, Huy Zhang, Ruiyi Deilamsalehy, Hanieh Mathur, Puneet Lai, Viet Dac Wang, Haoliang Subramanian, Jayakumar Rossi, Ryan A. Bui, Trung Vlassis, Nikos Dernoncourt, Franck Nguyen, Thien Huu |
| author_facet | Van Nguyen, Chien Nguyen, Huy Zhang, Ruiyi Deilamsalehy, Hanieh Mathur, Puneet Lai, Viet Dac Wang, Haoliang Subramanian, Jayakumar Rossi, Ryan A. Bui, Trung Vlassis, Nikos Dernoncourt, Franck Nguyen, Thien Huu |
| contents | We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due to the quadratic complexity of softmax attention and the growing Key-Value (KV) cache that makes inference memory-bound by context length. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving model quality. Unlike prior linearization methods constrained by fixed, non-adaptive structures, Lizard augments the architecture with compact, learnable modules that enable adaptive memory control and robust length generalization. Moreover, we introduce a hardwareaware algorithm that solves numerical instability in gated attention to accelerate training. Extensive experiments show that Lizard achieves near-lossless recovery of its teacher model's performance, significantly outperforming previous methods by up to 9.4 - 24.5 points on the 5-shot MMLU benchmark and demonstrating superior associative recall. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2507_09025 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Lizard: An Efficient Linearization Framework for Large Language Models Van Nguyen, Chien Nguyen, Huy Zhang, Ruiyi Deilamsalehy, Hanieh Mathur, Puneet Lai, Viet Dac Wang, Haoliang Subramanian, Jayakumar Rossi, Ryan A. Bui, Trung Vlassis, Nikos Dernoncourt, Franck Nguyen, Thien Huu Computation and Language Machine Learning We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due to the quadratic complexity of softmax attention and the growing Key-Value (KV) cache that makes inference memory-bound by context length. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving model quality. Unlike prior linearization methods constrained by fixed, non-adaptive structures, Lizard augments the architecture with compact, learnable modules that enable adaptive memory control and robust length generalization. Moreover, we introduce a hardwareaware algorithm that solves numerical instability in gated attention to accelerate training. Extensive experiments show that Lizard achieves near-lossless recovery of its teacher model's performance, significantly outperforming previous methods by up to 9.4 - 24.5 points on the 5-shot MMLU benchmark and demonstrating superior associative recall. |
| title | Lizard: An Efficient Linearization Framework for Large Language Models |
| topic | Computation and Language Machine Learning |
| url | https://arxiv.org/abs/2507.09025 |