Saved in:
| Main Authors: | Nazari, Philipp, Rusch, T. Konstantin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.04852 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Curious Case of In-Training Compression of State Space Models
by: Chahine, Makram, et al.
Published: (2025)
by: Chahine, Makram, et al.
Published: (2025)
Oscillatory State-Space Models
by: Rusch, T. Konstantin, et al.
Published: (2024)
by: Rusch, T. Konstantin, et al.
Published: (2024)
Learning to Dissipate Energy in Oscillatory State-Space Models
by: Boyer, Jared, et al.
Published: (2025)
by: Boyer, Jared, et al.
Published: (2025)
Low-Pass Flow Matching
by: Ruscio, Francesco M., et al.
Published: (2026)
by: Ruscio, Francesco M., et al.
Published: (2026)
State Rank Dynamics in Linear Attention LLMs
by: Sun, Ao, et al.
Published: (2026)
by: Sun, Ao, et al.
Published: (2026)
Low-Rank Key Value Attention
by: O'Neill, James, et al.
Published: (2026)
by: O'Neill, James, et al.
Published: (2026)
Quantifying Memory Use in Reinforcement Learning with Temporal Range
by: Lafuente-Mercado, Rodney, et al.
Published: (2025)
by: Lafuente-Mercado, Rodney, et al.
Published: (2025)
Low Stein Discrepancy via Message-Passing Monte Carlo
by: Kirk, Nathan, et al.
Published: (2025)
by: Kirk, Nathan, et al.
Published: (2025)
Relaxed Equivariance via Multitask Learning
by: Elhag, Ahmed A., et al.
Published: (2024)
by: Elhag, Ahmed A., et al.
Published: (2024)
Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective
by: Boursier, Etienne, et al.
Published: (2025)
by: Boursier, Etienne, et al.
Published: (2025)
Message-Passing Monte Carlo: Generating low-discrepancy point sets via Graph Neural Networks
by: Rusch, T. Konstantin, et al.
Published: (2024)
by: Rusch, T. Konstantin, et al.
Published: (2024)
LoLA: Low-Rank Linear Attention With Sparse Caching
by: McDermott, Luke, et al.
Published: (2025)
by: McDermott, Luke, et al.
Published: (2025)
LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport
by: Shahbazi, Ashkan, et al.
Published: (2025)
by: Shahbazi, Ashkan, et al.
Published: (2025)
Neural Low-Discrepancy Sequences
by: Van Huffel, Michael Etienne, et al.
Published: (2025)
by: Van Huffel, Michael Etienne, et al.
Published: (2025)
Rank Reduction Autoencoders
by: Mounayer, Jad, et al.
Published: (2024)
by: Mounayer, Jad, et al.
Published: (2024)
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
by: Wu, Ziyang, et al.
Published: (2024)
by: Wu, Ziyang, et al.
Published: (2024)
Variational Rank Reduction Autoencoders
by: Mounayer, Jad, et al.
Published: (2025)
by: Mounayer, Jad, et al.
Published: (2025)
Scaling Linear Attention with Sparse State Expansion
by: Pan, Yuqi, et al.
Published: (2025)
by: Pan, Yuqi, et al.
Published: (2025)
Power-based Partial Attention: Bridging Linear-Complexity and Full Attention
by: Huang, Yufeng
Published: (2026)
by: Huang, Yufeng
Published: (2026)
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
by: Ro, Yeonju, et al.
Published: (2025)
by: Ro, Yeonju, et al.
Published: (2025)
On the Benefits of Rank in Attention Layers
by: Amsel, Noah, et al.
Published: (2024)
by: Amsel, Noah, et al.
Published: (2024)
How does over-squashing affect the power of GNNs?
by: Di Giovanni, Francesco, et al.
Published: (2023)
by: Di Giovanni, Francesco, et al.
Published: (2023)
Low-Rank Tensor Decompositions for the Theory of Neural Networks
by: Borsoi, Ricardo, et al.
Published: (2025)
by: Borsoi, Ricardo, et al.
Published: (2025)
Multi-Head Low-Rank Attention
by: Liu, Songtao, et al.
Published: (2026)
by: Liu, Songtao, et al.
Published: (2026)
Log-Linear Attention
by: Guo, Han, et al.
Published: (2025)
by: Guo, Han, et al.
Published: (2025)
Causal Attention with Lookahead Keys
by: Song, Zhuoqing, et al.
Published: (2025)
by: Song, Zhuoqing, et al.
Published: (2025)
Convolutional versus Dense Neural Networks: Comparing the Two Neural Networks Performance in Predicting Building Operational Energy Use Based on the Building Shape
by: Nazari, Farnaz, et al.
Published: (2021)
by: Nazari, Farnaz, et al.
Published: (2021)
Kalman Linear Attention: Parallel Bayesian Filtering For Efficient Language Modelling and State Tracking
by: Shaj, Vaisakh, et al.
Published: (2026)
by: Shaj, Vaisakh, et al.
Published: (2026)
A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA
by: Gupta, Neelesh, et al.
Published: (2026)
by: Gupta, Neelesh, et al.
Published: (2026)
A Reduction Algorithm for Markovian Contextual Linear Bandits
by: Buyukkalayci, Kaan, et al.
Published: (2026)
by: Buyukkalayci, Kaan, et al.
Published: (2026)
Spiky Rank and Its Applications to Rigidity and Circuits
by: Hambardzumyan, Lianna, et al.
Published: (2026)
by: Hambardzumyan, Lianna, et al.
Published: (2026)
Exact Linear Attention
by: Ou, Weinuo
Published: (2026)
by: Ou, Weinuo
Published: (2026)
Kaczmarz Linear Attention
by: Zou, Jiaxuan, et al.
Published: (2026)
by: Zou, Jiaxuan, et al.
Published: (2026)
An IoT Framework for Building Energy Optimization Using Machine Learning-based MPC
by: Morteza, Aryan, et al.
Published: (2024)
by: Morteza, Aryan, et al.
Published: (2024)
On the Learnability of Offline Model-Based Optimization: A Ranking Perspective
by: Lyu, Shen-Huan, et al.
Published: (2026)
by: Lyu, Shen-Huan, et al.
Published: (2026)
Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse
by: Baker, Bradley T., et al.
Published: (2024)
by: Baker, Bradley T., et al.
Published: (2024)
Coupled Query-Key Dynamics for Attention
by: Gahtan, Barak, et al.
Published: (2026)
by: Gahtan, Barak, et al.
Published: (2026)
The Rank-Reduced Kalman Filter: Approximate Dynamical-Low-Rank Filtering In High Dimensions
by: Schmidt, Jonathan, et al.
Published: (2023)
by: Schmidt, Jonathan, et al.
Published: (2023)
Why Softmax Attention Outperforms Linear Attention
by: Deng, Yichuan, et al.
Published: (2023)
by: Deng, Yichuan, et al.
Published: (2023)
Ranking In Generalized Linear Bandits
by: Shidani, Amitis, et al.
Published: (2022)
by: Shidani, Amitis, et al.
Published: (2022)
Similar Items
-
The Curious Case of In-Training Compression of State Space Models
by: Chahine, Makram, et al.
Published: (2025) -
Oscillatory State-Space Models
by: Rusch, T. Konstantin, et al.
Published: (2024) -
Learning to Dissipate Energy in Oscillatory State-Space Models
by: Boyer, Jared, et al.
Published: (2025) -
Low-Pass Flow Matching
by: Ruscio, Francesco M., et al.
Published: (2026) -
State Rank Dynamics in Linear Attention LLMs
by: Sun, Ao, et al.
Published: (2026)