Saved in:
Bibliographic Details
Main Authors: Meng, Weikang, Huo, Liangyu, Luo, Yadan, Wang, Yaowei, Li, Yingjian, Zhang, Zheng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.04346
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908812133269504
author Meng, Weikang
Huo, Liangyu
Luo, Yadan
Wang, Yaowei
Li, Yingjian
Zhang, Zheng
author_facet Meng, Weikang
Huo, Liangyu
Luo, Yadan
Wang, Yaowei
Li, Yingjian
Zhang, Zheng
contents Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance. We identify the root cause of this degradation as the non-negativity constraint imposed on kernel feature maps: standard projections like ReLU act as "passive truncation" operators, indiscriminately discarding semantic information residing in the negative domain. We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation. By leveraging learnable Householder reflections, MirrorLA rotates the feature geometry into the non-negative orthant to maximize information retention. Our approach restores representational density through a cohesive, multi-scale design: it first optimizes local discriminability via block-wise isometries, stabilizes long-context dynamics using variance-aware modulation to diversify activations, and finally, integrates dispersed subspaces via cross-head reflections to induce global covariance mixing. MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
format Preprint
id arxiv_https___arxiv_org_abs_2602_04346
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle MirrorLA: Reflecting Feature Map for Vision Linear Attention
Meng, Weikang
Huo, Liangyu
Luo, Yadan
Wang, Yaowei
Li, Yingjian
Zhang, Zheng
Machine Learning
Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance. We identify the root cause of this degradation as the non-negativity constraint imposed on kernel feature maps: standard projections like ReLU act as "passive truncation" operators, indiscriminately discarding semantic information residing in the negative domain. We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation. By leveraging learnable Householder reflections, MirrorLA rotates the feature geometry into the non-negative orthant to maximize information retention. Our approach restores representational density through a cohesive, multi-scale design: it first optimizes local discriminability via block-wise isometries, stabilizes long-context dynamics using variance-aware modulation to diversify activations, and finally, integrates dispersed subspaces via cross-head reflections to induce global covariance mixing. MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
title MirrorLA: Reflecting Feature Map for Vision Linear Attention
topic Machine Learning
url https://arxiv.org/abs/2602.04346