:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Siran, Cao, Zane, He, Yongchao
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2512.14082
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding
by: Liu, Siran, et al.
Published: (2025)

ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification
by: Liu, Siran, et al.
Published: (2026)

RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference
by: Liu, Siran, et al.
Published: (2026)

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024)

SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving
by: Zhao, Bohan, et al.
Published: (2025)

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space
by: Shen, Zhenyi, et al.
Published: (2025)

Less Is More: Fast and Accurate Reasoning with Cross-Head Unified Sparse Attention
by: Yang, Lijie, et al.
Published: (2025)

Rectified Sparse Attention
by: Sun, Yutao, et al.
Published: (2025)

Multi-Granularity Open Intent Classification via Adaptive Granular-Ball Decision Boundary
by: Li, Yanhua, et al.
Published: (2024)

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
by: Gao, Yizhao, et al.
Published: (2024)

CtrlCoT: Dual-Granularity Chain-of-Thought Compression for Controllable Reasoning
by: Fan, Zhenxuan, et al.
Published: (2026)

Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)

UniICL: An Efficient Unified Framework Unifying Compression, Selection, and Generation
by: Gao, Jun, et al.
Published: (2024)

R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
by: Wang, Yibo, et al.
Published: (2025)

MGSA: Multi-Granularity Graph Structure Attention for Knowledge Graph-to-Text Generation
by: Wang, Shanshan, et al.
Published: (2024)

ProxyAttn: Guided Sparse Attention via Representative Heads
by: Wang, Yixuan, et al.
Published: (2025)

Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping
by: Chen, Yao, et al.
Published: (2026)

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
by: Piękos, Piotr, et al.
Published: (2025)

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
by: Yang, Shang, et al.
Published: (2025)

Sparser Block-Sparse Attention via Token Permutation
by: Wang, Xinghao, et al.
Published: (2025)

SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
by: S, Santhosh G, et al.
Published: (2025)

AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation
by: Luo, Lvzhou, et al.
Published: (2025)

Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation
by: Li, Mingzhe, et al.
Published: (2022)

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
by: He, Mutian, et al.
Published: (2025)

DTELS: Towards Dynamic Granularity of Timeline Summarization
by: Zhang, Chenlong, et al.
Published: (2024)

FreeChunker: A Cross-Granularity Chunking Framework
by: Zhang, Wenxuan, et al.
Published: (2025)

Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference
by: Yan, Siyuan, et al.
Published: (2025)

SparseD: Sparse Attention for Diffusion Language Models
by: Wang, Zeqing, et al.
Published: (2025)

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression
by: Cunegatti, Elia, et al.
Published: (2026)

ReCode: Unify Plan and Action for Universal Granularity Control
by: Yu, Zhaoyang, et al.
Published: (2025)

Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space
by: Figliolia, Tomas, et al.
Published: (2025)

Multi-Granularity Semantic Revision for Large Language Model Distillation
by: Liu, Xiaoyu, et al.
Published: (2024)

Lag-Relative Sparse Attention In Long Context Training
by: Liang, Manlai, et al.
Published: (2025)

Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts
by: Chen, Yingfa, et al.
Published: (2024)

Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model
by: Cao, Xi, et al.
Published: (2024)

CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring
by: Su, Jiamin, et al.
Published: (2025)

Multi-Granularity Guided Fusion-in-Decoder
by: Choi, Eunseong, et al.
Published: (2024)

FASA: Frequency-aware Sparse Attention
by: Wang, Yifei, et al.
Published: (2026)

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026)

Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning
by: Wang, Junxuan, et al.
Published: (2025)