:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Han, Yang, Songlin, Goel, Tarushii, Xing, Eric P., Dao, Tri, Kim, Yoon
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.04761
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Gated Linear Attention Transformers with Hardware-Efficient Training
by: Yang, Songlin, et al.
Published: (2023)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces
by: Gu, Albert, et al.
Published: (2023)

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
by: Guo, Han, et al.
Published: (2026)

Hardware-Efficient Attention for Fast Decoding
by: Zadouri, Ted, et al.
Published: (2025)

Parallelizing Linear Transformers with the Delta Rule over Sequence Length
by: Yang, Songlin, et al.
Published: (2024)

LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
by: Guo, Han, et al.
Published: (2023)

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
by: Dao, Tri, et al.
Published: (2024)

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
by: Shah, Jay, et al.
Published: (2024)

Fast KV Compaction via Attention Matching
by: Zweiger, Adam, et al.
Published: (2026)

Adaptive Memory Decay for Log-Linear Attention
by: Amin, Yaxita, et al.
Published: (2026)

Speculative Speculative Decoding
by: Kumar, Tanishq, et al.
Published: (2026)

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
by: Mishra, Mayank, et al.
Published: (2026)

PaTH Attention: Position Encoding via Accumulating Householder Transformations
by: Yang, Songlin, et al.
Published: (2025)

In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks
by: Goel, Ayush, et al.
Published: (2026)

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
by: Guo, Wentao, et al.
Published: (2025)

Linear Log-Normal Attention with Unbiased Concentration
by: Nahshan, Yury, et al.
Published: (2023)

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
by: Hwang, Sukjun, et al.
Published: (2024)

Fast Matrix Multiplications for Lookup Table-Quantized LLMs
by: Guo, Han, et al.
Published: (2024)

Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function
by: Vennos, Amy, et al.
Published: (2026)

Cottention: Linear Transformers With Cosine Attention
by: Mongaras, Gabriel, et al.
Published: (2024)

ZeroS: Zero-Sum Linear Attention for Efficient Transformers
by: Lu, Jiecheng, et al.
Published: (2026)

Learning to Interpret Weight Differences in Language Models
by: Goel, Avichal, et al.
Published: (2025)

The Dark Side of Trust: Authority Citation-Driven Jailbreak Attacks on Large Language Models
by: Yang, Xikang, et al.
Published: (2024)

The Mamba in the Llama: Distilling and Accelerating Hybrid Models
by: Wang, Junxiong, et al.
Published: (2024)

Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning
by: Goel, Gautam, et al.
Published: (2026)

BitDelta: Your Fine-Tune May Only Be Worth One Bit
by: Liu, James, et al.
Published: (2024)

SEA: Sparse Linear Attention with Estimated Attention Mask
by: Lee, Heejun, et al.
Published: (2023)

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
by: Tan, Shawn, et al.
Published: (2024)

Superiority of Multi-Head Attention in In-Context Linear Regression
by: Cui, Yingqian, et al.
Published: (2024)

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
by: Schiff, Yair, et al.
Published: (2024)

RAM-Net: Expressive Linear Attention with Selectively Addressable Memory
by: Xiao, Kaicheng, et al.
Published: (2026)

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
by: Wang, Junxiong, et al.
Published: (2025)

Attention Is All You Need for KV Cache in Diffusion LLMs
by: Nguyen-Tri, Quan, et al.
Published: (2025)

Inductive Spatio-Temporal Kriging with Physics-Guided Increment Training Strategy for Air Quality Inference
by: Yang, Songlin, et al.
Published: (2025)

Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs
by: Hy, Truong Son, et al.
Published: (2023)

Scaling Bidirectional Spans and Span Violations in Attention Mechanism
by: Kim, Jongwook, et al.
Published: (2025)

Conditional Generative Framework with Peak-Aware Attention for Robust Chemical Detection under Interferences
by: Yoon, Namkyung, et al.
Published: (2026)

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
by: Goldstein, Daniel, et al.
Published: (2025)

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
by: Zhang, Muru, et al.
Published: (2025)

On the Duality between Gradient Transformations and Adapters
by: Torroba-Hennigen, Lucas, et al.
Published: (2025)