Saved in:
| Main Authors: | Guo, Han, Zhang, Jack, Menon, Arjun, Guessous, Driss, Thakkar, Vijay, Kim, Yoon, Dao, Tri |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.19269 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Flex Attention: A Programming Model for Generating Optimized Attention Kernels
by: Dong, Juechu, et al.
Published: (2024)
by: Dong, Juechu, et al.
Published: (2024)
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
by: Shah, Jay, et al.
Published: (2024)
by: Shah, Jay, et al.
Published: (2024)
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
by: Dao, Tri, et al.
Published: (2024)
by: Dao, Tri, et al.
Published: (2024)
Log-Linear Attention
by: Guo, Han, et al.
Published: (2025)
by: Guo, Han, et al.
Published: (2025)
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
by: Park, Gunho, et al.
Published: (2025)
by: Park, Gunho, et al.
Published: (2025)
A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators
by: Vatsavai, Sairam Sri, et al.
Published: (2024)
by: Vatsavai, Sairam Sri, et al.
Published: (2024)
Assessing the Impact of Speaker Identity in Speech Spoofing Detection
by: Dao, Anh-Tuan, et al.
Published: (2026)
by: Dao, Anh-Tuan, et al.
Published: (2026)
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
by: Gu, Albert, et al.
Published: (2023)
by: Gu, Albert, et al.
Published: (2023)
Enhancing Multi-Corpus Training in SSL-Based Anti-Spoofing Models: Domain-Invariant Feature Extraction
by: Dao, Anh-Tuan, et al.
Published: (2026)
by: Dao, Anh-Tuan, et al.
Published: (2026)
Speculative Speculative Decoding
by: Kumar, Tanishq, et al.
Published: (2026)
by: Kumar, Tanishq, et al.
Published: (2026)
Hardware-Efficient Attention for Fast Decoding
by: Zadouri, Ted, et al.
Published: (2025)
by: Zadouri, Ted, et al.
Published: (2025)
LP-GEMM: Integrating Layout Propagation into GEMM Operations
by: Carneiro, César Guedes, et al.
Published: (2026)
by: Carneiro, César Guedes, et al.
Published: (2026)
Improving Black-box Robustness with In-Context Rewriting
by: O'Brien, Kyle, et al.
Published: (2024)
by: O'Brien, Kyle, et al.
Published: (2024)
LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
by: Hu, Huanqi, et al.
Published: (2025)
by: Hu, Huanqi, et al.
Published: (2025)
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
by: Guo, Wentao, et al.
Published: (2025)
by: Guo, Wentao, et al.
Published: (2025)
CODA: Coordination via On-Policy Diffusion for Multi-Agent Offline Reinforcement Learning
by: Hedman, Marcel, et al.
Published: (2026)
by: Hedman, Marcel, et al.
Published: (2026)
On the Duality between Gradient Transformations and Adapters
by: Torroba-Hennigen, Lucas, et al.
Published: (2025)
by: Torroba-Hennigen, Lucas, et al.
Published: (2025)
Steering Code LLMs with Activation Directions for Language and Library Control
by: Rahman, Md Mahbubur, et al.
Published: (2026)
by: Rahman, Md Mahbubur, et al.
Published: (2026)
Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
by: Hwang, Sukjun, et al.
Published: (2024)
by: Hwang, Sukjun, et al.
Published: (2024)
tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI
by: Nair, Harideep, et al.
Published: (2024)
by: Nair, Harideep, et al.
Published: (2024)
TorchAO: PyTorch-Native Training-to-Serving Model Optimization
by: Or, Andrew, et al.
Published: (2025)
by: Or, Andrew, et al.
Published: (2025)
More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation
by: Zi, Yangtian, et al.
Published: (2025)
by: Zi, Yangtian, et al.
Published: (2025)
CODA: A Continuous Online Evolve Framework for Deploying HAR Sensing Systems
by: Qiu, Minghui, et al.
Published: (2024)
by: Qiu, Minghui, et al.
Published: (2024)
D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation
by: Liu, I-Chun Arthur, et al.
Published: (2025)
by: Liu, I-Chun Arthur, et al.
Published: (2025)
Sustainable Transformer Neural Network Acceleration with Stochastic Photonic Computing
by: Afifi, S., et al.
Published: (2026)
by: Afifi, S., et al.
Published: (2026)
Towards Fully FP8 GEMM LLM Training at Scale
by: Hernández-Cano, Alejandro, et al.
Published: (2025)
by: Hernández-Cano, Alejandro, et al.
Published: (2025)
ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks
by: Afifi, Salma, et al.
Published: (2024)
by: Afifi, Salma, et al.
Published: (2024)
Partially Rewriting a Transformer in Natural Language
by: Paulo, Gonçalo, et al.
Published: (2025)
by: Paulo, Gonçalo, et al.
Published: (2025)
Accelerating Sparse Ternary GEMM for Quantized ML on Apple Silicon
by: Lipshitz, Baraq, et al.
Published: (2025)
by: Lipshitz, Baraq, et al.
Published: (2025)
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
by: Mishra, Mayank, et al.
Published: (2026)
by: Mishra, Mayank, et al.
Published: (2026)
Search Your Block Floating Point Scales!
by: Gupta, Tanmaey, et al.
Published: (2026)
by: Gupta, Tanmaey, et al.
Published: (2026)
Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs
by: Cummins, Chris, et al.
Published: (2024)
by: Cummins, Chris, et al.
Published: (2024)
RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models
by: Cong, Xing, et al.
Published: (2026)
by: Cong, Xing, et al.
Published: (2026)
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
Fast KV Compaction via Attention Matching
by: Zweiger, Adam, et al.
Published: (2026)
by: Zweiger, Adam, et al.
Published: (2026)
BitDelta: Your Fine-Tune May Only Be Worth One Bit
by: Liu, James, et al.
Published: (2024)
by: Liu, James, et al.
Published: (2024)
A method of using RSVD in residual calculation of LowBit GEMM
by: Gu, Hongyaoxing
Published: (2024)
by: Gu, Hongyaoxing
Published: (2024)
Simplifying Transformer Blocks
by: He, Bobby, et al.
Published: (2023)
by: He, Bobby, et al.
Published: (2023)
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
by: Song, Jiwon, et al.
Published: (2024)
by: Song, Jiwon, et al.
Published: (2024)
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
by: Sun, Zeyi, et al.
Published: (2025)
by: Sun, Zeyi, et al.
Published: (2025)
Similar Items
-
Flex Attention: A Programming Model for Generating Optimized Attention Kernels
by: Dong, Juechu, et al.
Published: (2024) -
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
by: Shah, Jay, et al.
Published: (2024) -
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
by: Dao, Tri, et al.
Published: (2024) -
Log-Linear Attention
by: Guo, Han, et al.
Published: (2025) -
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
by: Park, Gunho, et al.
Published: (2025)