Saved in:
| Main Authors: | Zou, Jiaxuan, Ren, Ruifeng, Liu, Yong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.08587 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
by: Ren, Ruifeng, et al.
Published: (2025)
by: Ren, Ruifeng, et al.
Published: (2025)
Exploring the Limitations of Mamba in COPY and CoT Reasoning
by: Ren, Ruifeng, et al.
Published: (2024)
by: Ren, Ruifeng, et al.
Published: (2024)
T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method
by: Zhu, Haoyu, et al.
Published: (2025)
by: Zhu, Haoyu, et al.
Published: (2025)
Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)
by: Hu, Wenjie, et al.
Published: (2025)
Capabilities and Fundamental Limits of Latent Chain-of-Thought
by: Zou, Jiaxuan, et al.
Published: (2026)
by: Zou, Jiaxuan, et al.
Published: (2026)
KVBuffer: IO-aware Serving for Linear Attention
by: Zou, Longwei, et al.
Published: (2026)
by: Zou, Longwei, et al.
Published: (2026)
Compositional Generalization from Learned Skills via CoT Training: A Theoretical and Structural Analysis for Reasoning
by: Yao, Xinhao, et al.
Published: (2025)
by: Yao, Xinhao, et al.
Published: (2025)
Effective Frontiers: A Unification of Neural Scaling Laws
by: Zou, Jiaxuan, et al.
Published: (2026)
by: Zou, Jiaxuan, et al.
Published: (2026)
Superiority of Multi-Head Attention in In-Context Linear Regression
by: Cui, Yingqian, et al.
Published: (2024)
by: Cui, Yingqian, et al.
Published: (2024)
Exact Linear Attention
by: Ou, Weinuo
Published: (2026)
by: Ou, Weinuo
Published: (2026)
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
by: Zuo, Yifei, et al.
Published: (2025)
by: Zuo, Yifei, et al.
Published: (2025)
State Rank Dynamics in Linear Attention LLMs
by: Sun, Ao, et al.
Published: (2026)
by: Sun, Ao, et al.
Published: (2026)
Enhancing Linear Attention with Residual Learning
by: Lai, Xunhao, et al.
Published: (2025)
by: Lai, Xunhao, et al.
Published: (2025)
Adaptive Memory Decay for Log-Linear Attention
by: Amin, Yaxita, et al.
Published: (2026)
by: Amin, Yaxita, et al.
Published: (2026)
Linear Attention is Enough in Spatial-Temporal Forecasting
by: Ning, Xinyu
Published: (2024)
by: Ning, Xinyu
Published: (2024)
Linear Attention for Efficient Bidirectional Sequence Modeling
by: Afzal, Arshia, et al.
Published: (2025)
by: Afzal, Arshia, et al.
Published: (2025)
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
by: Beck, Maximilian, et al.
Published: (2025)
by: Beck, Maximilian, et al.
Published: (2025)
Employee Turnover Prediction: A Cross-component Attention Transformer with Consideration of Competitor Influence and Contagious Effect
by: Liu, Hao, et al.
Published: (2025)
by: Liu, Hao, et al.
Published: (2025)
Beyond Linearity in Attention Projections: The Case for Nonlinear Queries
by: Karbevski, Marko
Published: (2026)
by: Karbevski, Marko
Published: (2026)
ZeroS: Zero-Sum Linear Attention for Efficient Transformers
by: Lu, Jiecheng, et al.
Published: (2026)
by: Lu, Jiecheng, et al.
Published: (2026)
RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts
by: Joshi, Sahil, et al.
Published: (2025)
by: Joshi, Sahil, et al.
Published: (2025)
Higher-order Linear Attention
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
Learning under Quantization for High-Dimensional Linear Regression
by: Zhang, Dechen, et al.
Published: (2025)
by: Zhang, Dechen, et al.
Published: (2025)
E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory
by: Huang, Lin, et al.
Published: (2026)
by: Huang, Lin, et al.
Published: (2026)
Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain
by: Wi, Hyowon, et al.
Published: (2025)
by: Wi, Hyowon, et al.
Published: (2025)
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
by: Chou, Yuhong, et al.
Published: (2024)
by: Chou, Yuhong, et al.
Published: (2024)
LNUCB-TA: Linear-nonlinear Hybrid Bandit Learning with Temporal Attention
by: Khosravi, Hamed, et al.
Published: (2025)
by: Khosravi, Hamed, et al.
Published: (2025)
Scaling Laws for Precision in High-Dimensional Linear Regression
by: Zhang, Dechen, et al.
Published: (2026)
by: Zhang, Dechen, et al.
Published: (2026)
Efficient Linear Attention for Multivariate Time Series Modeling via Entropy Equality
by: Zhang, Mingtao, et al.
Published: (2025)
by: Zhang, Mingtao, et al.
Published: (2025)
Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting
by: Lu, Jiecheng, et al.
Published: (2025)
by: Lu, Jiecheng, et al.
Published: (2025)
Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition
by: Zhao, Guanqun, et al.
Published: (2026)
by: Zhao, Guanqun, et al.
Published: (2026)
Attention-Aided MMSE for OFDM Channel Estimation: Learning Linear Filters with Attention
by: Ha, TaeJun, et al.
Published: (2025)
by: Ha, TaeJun, et al.
Published: (2025)
In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks
by: Goel, Ayush, et al.
Published: (2026)
by: Goel, Ayush, et al.
Published: (2026)
Global Attention with Linear Complexity for Exascale Generative Data Assimilation in Earth System Prediction
by: Wang, Xiao, et al.
Published: (2026)
by: Wang, Xiao, et al.
Published: (2026)
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
by: MiniCPM Team, et al.
Published: (2026)
by: MiniCPM Team, et al.
Published: (2026)
FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression
by: Gao, Yifei, et al.
Published: (2025)
by: Gao, Yifei, et al.
Published: (2025)
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
by: Chen, Xingwu, et al.
Published: (2025)
by: Chen, Xingwu, et al.
Published: (2025)
Parallax: Parameterized Local Linear Attention for Language Modeling
by: Zuo, Yifei, et al.
Published: (2026)
by: Zuo, Yifei, et al.
Published: (2026)
Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision
by: Chen, Jiaxuan, et al.
Published: (2025)
by: Chen, Jiaxuan, et al.
Published: (2025)
Pretrained battery transformer (PBT): A foundation model for universal battery life prediction
by: Tan, Ruifeng, et al.
Published: (2025)
by: Tan, Ruifeng, et al.
Published: (2025)
Similar Items
-
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
by: Ren, Ruifeng, et al.
Published: (2025) -
Exploring the Limitations of Mamba in COPY and CoT Reasoning
by: Ren, Ruifeng, et al.
Published: (2024) -
T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method
by: Zhu, Haoyu, et al.
Published: (2025) -
Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025) -
Capabilities and Fundamental Limits of Latent Chain-of-Thought
by: Zou, Jiaxuan, et al.
Published: (2026)