:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Han, Zhang, Jack, Menon, Arjun, Guessous, Driss, Thakkar, Vijay, Kim, Yoon, Dao, Tri
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.19269
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Flex Attention: A Programming Model for Generating Optimized Attention Kernels
by: Dong, Juechu, et al.
Published: (2024)

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
by: Shah, Jay, et al.
Published: (2024)

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
by: Dao, Tri, et al.
Published: (2024)

Log-Linear Attention
by: Guo, Han, et al.
Published: (2025)

CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
by: Park, Gunho, et al.
Published: (2025)

A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators
by: Vatsavai, Sairam Sri, et al.
Published: (2024)

Assessing the Impact of Speaker Identity in Speech Spoofing Detection
by: Dao, Anh-Tuan, et al.
Published: (2026)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces
by: Gu, Albert, et al.
Published: (2023)

Enhancing Multi-Corpus Training in SSL-Based Anti-Spoofing Models: Domain-Invariant Feature Extraction
by: Dao, Anh-Tuan, et al.
Published: (2026)

Speculative Speculative Decoding
by: Kumar, Tanishq, et al.
Published: (2026)

Hardware-Efficient Attention for Fast Decoding
by: Zadouri, Ted, et al.
Published: (2025)

LP-GEMM: Integrating Layout Propagation into GEMM Operations
by: Carneiro, César Guedes, et al.
Published: (2026)

Improving Black-box Robustness with In-Context Rewriting
by: O'Brien, Kyle, et al.
Published: (2024)

LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
by: Hu, Huanqi, et al.
Published: (2025)

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
by: Guo, Wentao, et al.
Published: (2025)

CODA: Coordination via On-Policy Diffusion for Multi-Agent Offline Reinforcement Learning
by: Hedman, Marcel, et al.
Published: (2026)

On the Duality between Gradient Transformations and Adapters
by: Torroba-Hennigen, Lucas, et al.
Published: (2025)

Steering Code LLMs with Activation Directions for Language and Library Control
by: Rahman, Md Mahbubur, et al.
Published: (2026)

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
by: Hwang, Sukjun, et al.
Published: (2024)

tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI
by: Nair, Harideep, et al.
Published: (2024)

TorchAO: PyTorch-Native Training-to-Serving Model Optimization
by: Or, Andrew, et al.
Published: (2025)

More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation
by: Zi, Yangtian, et al.
Published: (2025)

CODA: A Continuous Online Evolve Framework for Deploying HAR Sensing Systems
by: Qiu, Minghui, et al.
Published: (2024)

D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation
by: Liu, I-Chun Arthur, et al.
Published: (2025)

Sustainable Transformer Neural Network Acceleration with Stochastic Photonic Computing
by: Afifi, S., et al.
Published: (2026)

Towards Fully FP8 GEMM LLM Training at Scale
by: Hernández-Cano, Alejandro, et al.
Published: (2025)

ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks
by: Afifi, Salma, et al.
Published: (2024)

Partially Rewriting a Transformer in Natural Language
by: Paulo, Gonçalo, et al.
Published: (2025)

Accelerating Sparse Ternary GEMM for Quantized ML on Apple Silicon
by: Lipshitz, Baraq, et al.
Published: (2025)

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
by: Mishra, Mayank, et al.
Published: (2026)

Search Your Block Floating Point Scales!
by: Gupta, Tanmaey, et al.
Published: (2026)

Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs
by: Cummins, Chris, et al.
Published: (2024)

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models
by: Cong, Xing, et al.
Published: (2026)

The Mamba in the Llama: Distilling and Accelerating Hybrid Models
by: Wang, Junxiong, et al.
Published: (2024)

Fast KV Compaction via Attention Matching
by: Zweiger, Adam, et al.
Published: (2026)

BitDelta: Your Fine-Tune May Only Be Worth One Bit
by: Liu, James, et al.
Published: (2024)

A method of using RSVD in residual calculation of LowBit GEMM
by: Gu, Hongyaoxing
Published: (2024)

Simplifying Transformer Blocks
by: He, Bobby, et al.
Published: (2023)

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
by: Song, Jiwon, et al.
Published: (2024)

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
by: Sun, Zeyi, et al.
Published: (2025)