Saved in:
| Main Authors: | Yang, Xu, Zhang, Jiapeng, Zhao, Dongyang, Chen, Guo, Tang, Zhuo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.14224 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention
by: Xu, Yufei, et al.
Published: (2026)
by: Xu, Yufei, et al.
Published: (2026)
PQCache: Product Quantization-based KVCache for Long Context LLM Inference
by: Zhang, Hailin, et al.
Published: (2024)
by: Zhang, Hailin, et al.
Published: (2024)
VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
by: Guanzhong, Chen
Published: (2026)
by: Guanzhong, Chen
Published: (2026)
How Sparse Attention Approximates Exact Attention? Your Attention is Naturally $n^C$-Sparse
by: Deng, Yichuan, et al.
Published: (2024)
by: Deng, Yichuan, et al.
Published: (2024)
vAttention: Verified Sparse Attention
by: Desai, Aditya, et al.
Published: (2025)
by: Desai, Aditya, et al.
Published: (2025)
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
by: Gao, Yizhao, et al.
Published: (2025)
by: Gao, Yizhao, et al.
Published: (2025)
Block-Attention for Efficient Prefilling
by: Ma, Dongyang, et al.
Published: (2024)
by: Ma, Dongyang, et al.
Published: (2024)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)
by: Yuan, Jingyang, et al.
Published: (2025)
HSR-Enhanced Sparse Attention Acceleration
by: Chen, Bo, et al.
Published: (2024)
by: Chen, Bo, et al.
Published: (2024)
Stepwise Self-Consistent Mathematical Reasoning with Large Language Models
by: Zhao, Zilong, et al.
Published: (2024)
by: Zhao, Zilong, et al.
Published: (2024)
Sparse Attention Regression Network Based Soil Fertility Prediction With Ummaso
by: Rao, R V Raghavendra, et al.
Published: (2024)
by: Rao, R V Raghavendra, et al.
Published: (2024)
Improving Sparse Autoencoder with Dynamic Attention
by: Wang, Dongsheng, et al.
Published: (2026)
by: Wang, Dongsheng, et al.
Published: (2026)
NOSA: Native and Offloadable Sparse Attention
by: Huang, Yuxiang, et al.
Published: (2025)
by: Huang, Yuxiang, et al.
Published: (2025)
Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
by: Dev, Arundhathi, et al.
Published: (2026)
by: Dev, Arundhathi, et al.
Published: (2026)
Sparse Low-Ranked Self-Attention Transformer for Remaining Useful Lifetime Prediction of Optical Fiber Amplifiers
by: Schneider, Dominic, et al.
Published: (2024)
by: Schneider, Dominic, et al.
Published: (2024)
SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
by: Xu, Hongtao, et al.
Published: (2026)
by: Xu, Hongtao, et al.
Published: (2026)
AgentOCR: Reimagining Agent History via Optical Self-Compression
by: Feng, Lang, et al.
Published: (2026)
by: Feng, Lang, et al.
Published: (2026)
MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
by: Zhou, Ruijie, et al.
Published: (2026)
by: Zhou, Ruijie, et al.
Published: (2026)
Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Self-Attention Transformers
by: Karbevski, Marko, et al.
Published: (2025)
by: Karbevski, Marko, et al.
Published: (2025)
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
by: Zhu, Kan, et al.
Published: (2025)
by: Zhu, Kan, et al.
Published: (2025)
Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference
by: Le, Hoang Anh Duy, et al.
Published: (2026)
by: Le, Hoang Anh Duy, et al.
Published: (2026)
Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention
by: Wang, Yuanzhe, et al.
Published: (2026)
by: Wang, Yuanzhe, et al.
Published: (2026)
S2O: Early Stopping for Sparse Attention via Online Permutation
by: Zhang, Yu, et al.
Published: (2026)
by: Zhang, Yu, et al.
Published: (2026)
Spatial-Temporal Attention Model for Traffic State Estimation with Sparse Internet of Vehicles
by: Xue, Jianzhe, et al.
Published: (2024)
by: Xue, Jianzhe, et al.
Published: (2024)
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference
by: Zhao, Yushu, et al.
Published: (2025)
by: Zhao, Yushu, et al.
Published: (2025)
STDA: Spatio-Temporal Dual-Encoder Network Incorporating Driver Attention to Predict Driver Behaviors Under Safety-Critical Scenarios
by: Xu, Dongyang, et al.
Published: (2024)
by: Xu, Dongyang, et al.
Published: (2024)
Towards Robust Knowledge Tracing Models via k-Sparse Attention
by: Huang, Shuyan, et al.
Published: (2024)
by: Huang, Shuyan, et al.
Published: (2024)
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)
by: Huang, Yuxiang, et al.
Published: (2026)
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
by: Yang, Lijie, et al.
Published: (2024)
by: Yang, Lijie, et al.
Published: (2024)
Sparse Adapter Fusion for Continual Learning in NLP
by: Zeng, Min, et al.
Published: (2026)
by: Zeng, Min, et al.
Published: (2026)
Double-P: Hierarchical Top-P Sparse Attention for Long-Context LLMs
by: Ni, Wentao, et al.
Published: (2026)
by: Ni, Wentao, et al.
Published: (2026)
A Mathematical Theory of Top-$k$ Sparse Attention via Total Variation Distance
by: Tzachristas, Georgios, et al.
Published: (2025)
by: Tzachristas, Georgios, et al.
Published: (2025)
Hierarchical Sparse Plus Low Rank Compression of LLM
by: Kumar, Pawan, et al.
Published: (2025)
by: Kumar, Pawan, et al.
Published: (2025)
NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling
by: Grooten, Bram, et al.
Published: (2025)
by: Grooten, Bram, et al.
Published: (2025)
SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
by: S, Santhosh G, et al.
Published: (2025)
by: S, Santhosh G, et al.
Published: (2025)
Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving
by: Gao, Wei, et al.
Published: (2025)
by: Gao, Wei, et al.
Published: (2025)
Dual-Prototype Disentanglement: A Context-Aware Enhancement Framework for Time Series Forecasting
by: Yang, Haonan, et al.
Published: (2026)
by: Yang, Haonan, et al.
Published: (2026)
GL-LFGNN:A Global-Local Dual-branch Causal Graph Neural Network Based on Liang-Kleeman Information Flow for EEG Emotion Recognition
by: Wang, Ziyi, et al.
Published: (2026)
by: Wang, Ziyi, et al.
Published: (2026)
EF-LLM: Energy Forecasting LLM with AI-assisted Automation, Enhanced Sparse Prediction, Hallucination Detection
by: Qiu, Zihang, et al.
Published: (2024)
by: Qiu, Zihang, et al.
Published: (2024)
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
by: Zhao, Yilong, et al.
Published: (2025)
by: Zhao, Yilong, et al.
Published: (2025)
Similar Items
-
HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention
by: Xu, Yufei, et al.
Published: (2026) -
PQCache: Product Quantization-based KVCache for Long Context LLM Inference
by: Zhang, Hailin, et al.
Published: (2024) -
VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
by: Guanzhong, Chen
Published: (2026) -
How Sparse Attention Approximates Exact Attention? Your Attention is Naturally $n^C$-Sparse
by: Deng, Yichuan, et al.
Published: (2024) -
vAttention: Verified Sparse Attention
by: Desai, Aditya, et al.
Published: (2025)