Saved in:
Bibliographic Details
Main Authors: Van Nguyen, Chien, Nguyen, Huy, Zhang, Ruiyi, Deilamsalehy, Hanieh, Mathur, Puneet, Lai, Viet Dac, Wang, Haoliang, Subramanian, Jayakumar, Rossi, Ryan A., Bui, Trung, Vlassis, Nikos, Dernoncourt, Franck, Nguyen, Thien Huu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.09025
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917416493121536
author Van Nguyen, Chien
Nguyen, Huy
Zhang, Ruiyi
Deilamsalehy, Hanieh
Mathur, Puneet
Lai, Viet Dac
Wang, Haoliang
Subramanian, Jayakumar
Rossi, Ryan A.
Bui, Trung
Vlassis, Nikos
Dernoncourt, Franck
Nguyen, Thien Huu
author_facet Van Nguyen, Chien
Nguyen, Huy
Zhang, Ruiyi
Deilamsalehy, Hanieh
Mathur, Puneet
Lai, Viet Dac
Wang, Haoliang
Subramanian, Jayakumar
Rossi, Ryan A.
Bui, Trung
Vlassis, Nikos
Dernoncourt, Franck
Nguyen, Thien Huu
contents We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due to the quadratic complexity of softmax attention and the growing Key-Value (KV) cache that makes inference memory-bound by context length. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving model quality. Unlike prior linearization methods constrained by fixed, non-adaptive structures, Lizard augments the architecture with compact, learnable modules that enable adaptive memory control and robust length generalization. Moreover, we introduce a hardwareaware algorithm that solves numerical instability in gated attention to accelerate training. Extensive experiments show that Lizard achieves near-lossless recovery of its teacher model's performance, significantly outperforming previous methods by up to 9.4 - 24.5 points on the 5-shot MMLU benchmark and demonstrating superior associative recall.
format Preprint
id arxiv_https___arxiv_org_abs_2507_09025
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Lizard: An Efficient Linearization Framework for Large Language Models
Van Nguyen, Chien
Nguyen, Huy
Zhang, Ruiyi
Deilamsalehy, Hanieh
Mathur, Puneet
Lai, Viet Dac
Wang, Haoliang
Subramanian, Jayakumar
Rossi, Ryan A.
Bui, Trung
Vlassis, Nikos
Dernoncourt, Franck
Nguyen, Thien Huu
Computation and Language
Machine Learning
We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due to the quadratic complexity of softmax attention and the growing Key-Value (KV) cache that makes inference memory-bound by context length. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving model quality. Unlike prior linearization methods constrained by fixed, non-adaptive structures, Lizard augments the architecture with compact, learnable modules that enable adaptive memory control and robust length generalization. Moreover, we introduce a hardwareaware algorithm that solves numerical instability in gated attention to accelerate training. Extensive experiments show that Lizard achieves near-lossless recovery of its teacher model's performance, significantly outperforming previous methods by up to 9.4 - 24.5 points on the 5-shot MMLU benchmark and demonstrating superior associative recall.
title Lizard: An Efficient Linearization Framework for Large Language Models
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2507.09025