:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nazari, Philipp, Rusch, T. Konstantin
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.04852
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Curious Case of In-Training Compression of State Space Models
by: Chahine, Makram, et al.
Published: (2025)

Oscillatory State-Space Models
by: Rusch, T. Konstantin, et al.
Published: (2024)

Learning to Dissipate Energy in Oscillatory State-Space Models
by: Boyer, Jared, et al.
Published: (2025)

Low-Pass Flow Matching
by: Ruscio, Francesco M., et al.
Published: (2026)

State Rank Dynamics in Linear Attention LLMs
by: Sun, Ao, et al.
Published: (2026)

Low-Rank Key Value Attention
by: O'Neill, James, et al.
Published: (2026)

Quantifying Memory Use in Reinforcement Learning with Temporal Range
by: Lafuente-Mercado, Rodney, et al.
Published: (2025)

Low Stein Discrepancy via Message-Passing Monte Carlo
by: Kirk, Nathan, et al.
Published: (2025)

Relaxed Equivariance via Multitask Learning
by: Elhag, Ahmed A., et al.
Published: (2024)

Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective
by: Boursier, Etienne, et al.
Published: (2025)

Message-Passing Monte Carlo: Generating low-discrepancy point sets via Graph Neural Networks
by: Rusch, T. Konstantin, et al.
Published: (2024)

LoLA: Low-Rank Linear Attention With Sparse Caching
by: McDermott, Luke, et al.
Published: (2025)

LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport
by: Shahbazi, Ashkan, et al.
Published: (2025)

Neural Low-Discrepancy Sequences
by: Van Huffel, Michael Etienne, et al.
Published: (2025)

Rank Reduction Autoencoders
by: Mounayer, Jad, et al.
Published: (2024)

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
by: Wu, Ziyang, et al.
Published: (2024)

Variational Rank Reduction Autoencoders
by: Mounayer, Jad, et al.
Published: (2025)

Scaling Linear Attention with Sparse State Expansion
by: Pan, Yuqi, et al.
Published: (2025)

Power-based Partial Attention: Bridging Linear-Complexity and Full Attention
by: Huang, Yufeng
Published: (2026)

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
by: Ro, Yeonju, et al.
Published: (2025)

On the Benefits of Rank in Attention Layers
by: Amsel, Noah, et al.
Published: (2024)

How does over-squashing affect the power of GNNs?
by: Di Giovanni, Francesco, et al.
Published: (2023)

Low-Rank Tensor Decompositions for the Theory of Neural Networks
by: Borsoi, Ricardo, et al.
Published: (2025)

Multi-Head Low-Rank Attention
by: Liu, Songtao, et al.
Published: (2026)

Log-Linear Attention
by: Guo, Han, et al.
Published: (2025)

Causal Attention with Lookahead Keys
by: Song, Zhuoqing, et al.
Published: (2025)

Convolutional versus Dense Neural Networks: Comparing the Two Neural Networks Performance in Predicting Building Operational Energy Use Based on the Building Shape
by: Nazari, Farnaz, et al.
Published: (2021)

Kalman Linear Attention: Parallel Bayesian Filtering For Efficient Language Modelling and State Tracking
by: Shaj, Vaisakh, et al.
Published: (2026)

A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA
by: Gupta, Neelesh, et al.
Published: (2026)

A Reduction Algorithm for Markovian Contextual Linear Bandits
by: Buyukkalayci, Kaan, et al.
Published: (2026)

Spiky Rank and Its Applications to Rigidity and Circuits
by: Hambardzumyan, Lianna, et al.
Published: (2026)

Exact Linear Attention
by: Ou, Weinuo
Published: (2026)

Kaczmarz Linear Attention
by: Zou, Jiaxuan, et al.
Published: (2026)

An IoT Framework for Building Energy Optimization Using Machine Learning-based MPC
by: Morteza, Aryan, et al.
Published: (2024)

On the Learnability of Offline Model-Based Optimization: A Ranking Perspective
by: Lyu, Shen-Huan, et al.
Published: (2026)

Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse
by: Baker, Bradley T., et al.
Published: (2024)

Coupled Query-Key Dynamics for Attention
by: Gahtan, Barak, et al.
Published: (2026)

The Rank-Reduced Kalman Filter: Approximate Dynamical-Low-Rank Filtering In High Dimensions
by: Schmidt, Jonathan, et al.
Published: (2023)

Why Softmax Attention Outperforms Linear Attention
by: Deng, Yichuan, et al.
Published: (2023)

Ranking In Generalized Linear Bandits
by: Shidani, Amitis, et al.
Published: (2022)