:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Xu, Zhang, Jiapeng, Zhao, Dongyang, Chen, Guo, Tang, Zhuo
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.14224
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention
by: Xu, Yufei, et al.
Published: (2026)

PQCache: Product Quantization-based KVCache for Long Context LLM Inference
by: Zhang, Hailin, et al.
Published: (2024)

VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
by: Guanzhong, Chen
Published: (2026)

How Sparse Attention Approximates Exact Attention? Your Attention is Naturally $n^C$-Sparse
by: Deng, Yichuan, et al.
Published: (2024)

vAttention: Verified Sparse Attention
by: Desai, Aditya, et al.
Published: (2025)

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
by: Gao, Yizhao, et al.
Published: (2025)

Block-Attention for Efficient Prefilling
by: Ma, Dongyang, et al.
Published: (2024)

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)

HSR-Enhanced Sparse Attention Acceleration
by: Chen, Bo, et al.
Published: (2024)

Stepwise Self-Consistent Mathematical Reasoning with Large Language Models
by: Zhao, Zilong, et al.
Published: (2024)

Sparse Attention Regression Network Based Soil Fertility Prediction With Ummaso
by: Rao, R V Raghavendra, et al.
Published: (2024)

Improving Sparse Autoencoder with Dynamic Attention
by: Wang, Dongsheng, et al.
Published: (2026)

NOSA: Native and Offloadable Sparse Attention
by: Huang, Yuxiang, et al.
Published: (2025)

Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
by: Dev, Arundhathi, et al.
Published: (2026)

Sparse Low-Ranked Self-Attention Transformer for Remaining Useful Lifetime Prediction of Optical Fiber Amplifiers
by: Schneider, Dominic, et al.
Published: (2024)

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
by: Xu, Hongtao, et al.
Published: (2026)

AgentOCR: Reimagining Agent History via Optical Self-Compression
by: Feng, Lang, et al.
Published: (2026)

MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
by: Zhou, Ruijie, et al.
Published: (2026)

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Self-Attention Transformers
by: Karbevski, Marko, et al.
Published: (2025)

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
by: Zhu, Kan, et al.
Published: (2025)

Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference
by: Le, Hoang Anh Duy, et al.
Published: (2026)

Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention
by: Wang, Yuanzhe, et al.
Published: (2026)

S2O: Early Stopping for Sparse Attention via Online Permutation
by: Zhang, Yu, et al.
Published: (2026)

Spatial-Temporal Attention Model for Traffic State Estimation with Sparse Internet of Vehicles
by: Xue, Jianzhe, et al.
Published: (2024)

PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference
by: Zhao, Yushu, et al.
Published: (2025)

STDA: Spatio-Temporal Dual-Encoder Network Incorporating Driver Attention to Predict Driver Behaviors Under Safety-Critical Scenarios
by: Xu, Dongyang, et al.
Published: (2024)

Towards Robust Knowledge Tracing Models via k-Sparse Attention
by: Huang, Shuyan, et al.
Published: (2024)

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
by: Yang, Lijie, et al.
Published: (2024)

Sparse Adapter Fusion for Continual Learning in NLP
by: Zeng, Min, et al.
Published: (2026)

Double-P: Hierarchical Top-P Sparse Attention for Long-Context LLMs
by: Ni, Wentao, et al.
Published: (2026)

A Mathematical Theory of Top-$k$ Sparse Attention via Total Variation Distance
by: Tzachristas, Georgios, et al.
Published: (2025)

Hierarchical Sparse Plus Low Rank Compression of LLM
by: Kumar, Pawan, et al.
Published: (2025)

NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling
by: Grooten, Bram, et al.
Published: (2025)

SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
by: S, Santhosh G, et al.
Published: (2025)

Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving
by: Gao, Wei, et al.
Published: (2025)

Dual-Prototype Disentanglement: A Context-Aware Enhancement Framework for Time Series Forecasting
by: Yang, Haonan, et al.
Published: (2026)

GL-LFGNN:A Global-Local Dual-branch Causal Graph Neural Network Based on Liang-Kleeman Information Flow for EEG Emotion Recognition
by: Wang, Ziyi, et al.
Published: (2026)

EF-LLM: Energy Forecasting LLM with AI-assisted Automation, Enhanced Sparse Prediction, Hallucination Detection
by: Qiu, Zihang, et al.
Published: (2024)

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
by: Zhao, Yilong, et al.
Published: (2025)