Saved in:
| Main Authors: | Liu, Siran, Cao, Zane, He, Yongchao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.14082 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding
by: Liu, Siran, et al.
Published: (2025)
by: Liu, Siran, et al.
Published: (2025)
ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification
by: Liu, Siran, et al.
Published: (2026)
by: Liu, Siran, et al.
Published: (2026)
RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference
by: Liu, Siran, et al.
Published: (2026)
by: Liu, Siran, et al.
Published: (2026)
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024)
by: Zhu, Qianchao, et al.
Published: (2024)
SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving
by: Zhao, Bohan, et al.
Published: (2025)
by: Zhao, Bohan, et al.
Published: (2025)
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space
by: Shen, Zhenyi, et al.
Published: (2025)
by: Shen, Zhenyi, et al.
Published: (2025)
Less Is More: Fast and Accurate Reasoning with Cross-Head Unified Sparse Attention
by: Yang, Lijie, et al.
Published: (2025)
by: Yang, Lijie, et al.
Published: (2025)
Rectified Sparse Attention
by: Sun, Yutao, et al.
Published: (2025)
by: Sun, Yutao, et al.
Published: (2025)
Multi-Granularity Open Intent Classification via Adaptive Granular-Ball Decision Boundary
by: Li, Yanhua, et al.
Published: (2024)
by: Li, Yanhua, et al.
Published: (2024)
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
by: Gao, Yizhao, et al.
Published: (2024)
by: Gao, Yizhao, et al.
Published: (2024)
CtrlCoT: Dual-Granularity Chain-of-Thought Compression for Controllable Reasoning
by: Fan, Zhenxuan, et al.
Published: (2026)
by: Fan, Zhenxuan, et al.
Published: (2026)
Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)
by: Cao, Ziyi, et al.
Published: (2025)
UniICL: An Efficient Unified Framework Unifying Compression, Selection, and Generation
by: Gao, Jun, et al.
Published: (2024)
by: Gao, Jun, et al.
Published: (2024)
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
by: Wang, Yibo, et al.
Published: (2025)
by: Wang, Yibo, et al.
Published: (2025)
MGSA: Multi-Granularity Graph Structure Attention for Knowledge Graph-to-Text Generation
by: Wang, Shanshan, et al.
Published: (2024)
by: Wang, Shanshan, et al.
Published: (2024)
ProxyAttn: Guided Sparse Attention via Representative Heads
by: Wang, Yixuan, et al.
Published: (2025)
by: Wang, Yixuan, et al.
Published: (2025)
Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping
by: Chen, Yao, et al.
Published: (2026)
by: Chen, Yao, et al.
Published: (2026)
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
by: Piękos, Piotr, et al.
Published: (2025)
by: Piękos, Piotr, et al.
Published: (2025)
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
by: Yang, Shang, et al.
Published: (2025)
by: Yang, Shang, et al.
Published: (2025)
Sparser Block-Sparse Attention via Token Permutation
by: Wang, Xinghao, et al.
Published: (2025)
by: Wang, Xinghao, et al.
Published: (2025)
SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
by: S, Santhosh G, et al.
Published: (2025)
by: S, Santhosh G, et al.
Published: (2025)
AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation
by: Luo, Lvzhou, et al.
Published: (2025)
by: Luo, Lvzhou, et al.
Published: (2025)
Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation
by: Li, Mingzhe, et al.
Published: (2022)
by: Li, Mingzhe, et al.
Published: (2022)
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
by: He, Mutian, et al.
Published: (2025)
by: He, Mutian, et al.
Published: (2025)
DTELS: Towards Dynamic Granularity of Timeline Summarization
by: Zhang, Chenlong, et al.
Published: (2024)
by: Zhang, Chenlong, et al.
Published: (2024)
FreeChunker: A Cross-Granularity Chunking Framework
by: Zhang, Wenxuan, et al.
Published: (2025)
by: Zhang, Wenxuan, et al.
Published: (2025)
Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference
by: Yan, Siyuan, et al.
Published: (2025)
by: Yan, Siyuan, et al.
Published: (2025)
SparseD: Sparse Attention for Diffusion Language Models
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression
by: Cunegatti, Elia, et al.
Published: (2026)
by: Cunegatti, Elia, et al.
Published: (2026)
ReCode: Unify Plan and Action for Universal Granularity Control
by: Yu, Zhaoyang, et al.
Published: (2025)
by: Yu, Zhaoyang, et al.
Published: (2025)
Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space
by: Figliolia, Tomas, et al.
Published: (2025)
by: Figliolia, Tomas, et al.
Published: (2025)
Multi-Granularity Semantic Revision for Large Language Model Distillation
by: Liu, Xiaoyu, et al.
Published: (2024)
by: Liu, Xiaoyu, et al.
Published: (2024)
Lag-Relative Sparse Attention In Long Context Training
by: Liang, Manlai, et al.
Published: (2025)
by: Liang, Manlai, et al.
Published: (2025)
Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts
by: Chen, Yingfa, et al.
Published: (2024)
by: Chen, Yingfa, et al.
Published: (2024)
Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model
by: Cao, Xi, et al.
Published: (2024)
by: Cao, Xi, et al.
Published: (2024)
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring
by: Su, Jiamin, et al.
Published: (2025)
by: Su, Jiamin, et al.
Published: (2025)
Multi-Granularity Guided Fusion-in-Decoder
by: Choi, Eunseong, et al.
Published: (2024)
by: Choi, Eunseong, et al.
Published: (2024)
FASA: Frequency-aware Sparse Attention
by: Wang, Yifei, et al.
Published: (2026)
by: Wang, Yifei, et al.
Published: (2026)
HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026)
by: Gao, Yizhao, et al.
Published: (2026)
Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning
by: Wang, Junxuan, et al.
Published: (2025)
by: Wang, Junxuan, et al.
Published: (2025)
Similar Items
-
HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding
by: Liu, Siran, et al.
Published: (2025) -
ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification
by: Liu, Siran, et al.
Published: (2026) -
RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference
by: Liu, Siran, et al.
Published: (2026) -
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024) -
SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving
by: Zhao, Bohan, et al.
Published: (2025)