:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yan, Siyuan, Jiang, Guo-Qing, Zhang, Yuchen, Ma, Xiaoxing, Zhu, Ran, Cao, Chun, Xu, Jingwei
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2510.18413
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
von: Xiao, Guangxuan, et al.
Veröffentlicht: (2024)

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
von: Lai, Xunhao, et al.
Veröffentlicht: (2025)

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
von: Jo, Dongwon, et al.
Veröffentlicht: (2026)

MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
von: Li, Wenxuan, et al.
Veröffentlicht: (2025)

S$^3$-Attention:Attention-Aligned Endogenous Retrieval for Memory-Bounded Long-Context Inference
von: Ma, Qingsen, et al.
Veröffentlicht: (2026)

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
von: MiniCPM Team, et al.
Veröffentlicht: (2026)

SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs
von: Vo, James
Veröffentlicht: (2024)

HyLRA: Hybrid Layer Reuse Attention for Efficient Long-Context Inference
von: Ai, Xuan, et al.
Veröffentlicht: (2026)

ZigzagAttention: Efficient Long-Context Inference with Exclusive Retrieval and Streaming Heads
von: Liu, Zhuorui, et al.
Veröffentlicht: (2025)

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
von: Zhu, Qianchao, et al.
Veröffentlicht: (2024)

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
von: Ma, Da, et al.
Veröffentlicht: (2024)

Long-Context Generalization with Sparse Attention
von: Vasylenko, Pavlo, et al.
Veröffentlicht: (2025)

$π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
von: Liu, Dong, et al.
Veröffentlicht: (2025)

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
von: Gu, Zhuohan, et al.
Veröffentlicht: (2024)

Lag-Relative Sparse Attention In Long Context Training
von: Liang, Manlai, et al.
Veröffentlicht: (2025)

AsyncTLS: Efficient Generative LLM Inference with Asynchronous Two-level Sparse Attention
von: Hu, Yuxuan, et al.
Veröffentlicht: (2026)

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
von: Ling Team, et al.
Veröffentlicht: (2025)

Efficient Context Scaling with LongCat ZigZag Attention
von: Zhang, Chen, et al.
Veröffentlicht: (2025)

RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference
von: Liu, Siran, et al.
Veröffentlicht: (2026)

DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration
von: Zhang, Hanzhi, et al.
Veröffentlicht: (2025)

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
von: Zhu, Kan, et al.
Veröffentlicht: (2025)

Training-free Context-adaptive Attention for Efficient Long Context Modeling
von: You, Zeng, et al.
Veröffentlicht: (2025)

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
von: Liu, Di, et al.
Veröffentlicht: (2024)

Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
von: Xiao, Emily, et al.
Veröffentlicht: (2025)

SPLA: Block Sparse Plus Linear Attention for Long Context Modeling
von: Wang, Bailin, et al.
Veröffentlicht: (2026)

Squeezed Attention: Accelerating Long Context Length LLM Inference
von: Hooper, Coleman, et al.
Veröffentlicht: (2024)

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps
von: Zhou, Yanke, et al.
Veröffentlicht: (2026)

Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
von: Zhan, Zhihao, et al.
Veröffentlicht: (2025)

Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
von: Qiu, Quantong, et al.
Veröffentlicht: (2026)

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
von: Ge, Suyu, et al.
Veröffentlicht: (2024)

Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention
von: He, Ziwei, et al.
Veröffentlicht: (2023)

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
von: Huang, Yunpeng, et al.
Veröffentlicht: (2023)

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
von: Hu, Zhiyuan, et al.
Veröffentlicht: (2024)

Multipole Attention for Efficient Long Context Reasoning
von: Hooper, Coleman, et al.
Veröffentlicht: (2025)

Efficient Long-Context LLM Inference via KV Cache Clustering
von: Hu, Jie, et al.
Veröffentlicht: (2025)

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
von: Tang, Jiaming, et al.
Veröffentlicht: (2024)

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning
von: Zarch, Hossein Entezari, et al.
Veröffentlicht: (2025)

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
von: Jiang, Huiqiang, et al.
Veröffentlicht: (2024)

MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models
von: Zhang, Junyang, et al.
Veröffentlicht: (2025)

Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers
von: Aggarwal, Shubham, et al.
Veröffentlicht: (2026)