:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Chaoran, Zou, Lixin, Luo, Dan, Tang, Min, Luo, Xiangyang, Li, Zihao, Li, Chenliang
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2407.02328
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation
by: Sun, Yang, et al.
Published: (2025)

Flow Matching based Sequential Recommender Model
by: Liu, Feng, et al.
Published: (2025)

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026)

STS: Efficient Sparse Attention with Speculative Token Sparsity
by: Xu, Ceyu, et al.
Published: (2026)

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
by: Jo, Dongwon, et al.
Published: (2026)

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
by: Tang, Zecheng, et al.
Published: (2026)

AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation
by: Luo, Lvzhou, et al.
Published: (2025)

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
by: Wang, Hanrui, et al.
Published: (2020)

SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
by: Wang, Zekun, et al.
Published: (2023)

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
by: Luo, Chuwei, et al.
Published: (2022)

Sparser Block-Sparse Attention via Token Permutation
by: Wang, Xinghao, et al.
Published: (2025)

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
by: Chen, Yu, et al.
Published: (2026)

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)

Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
by: Xiao, Emily, et al.
Published: (2025)

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
by: Lai, Xunhao, et al.
Published: (2025)

Model Unlearning via Sparse Autoencoder Subspace Guided Projections
by: Wang, Xu, et al.
Published: (2025)

AsyncSpade: Efficient Test-Time Scaling with Asynchronous Sparse Decoding
by: Luo, Shuqing, et al.
Published: (2025)

FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination
by: Zhou, Pengfei, et al.
Published: (2024)

Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation
by: Li, Shiwei, et al.
Published: (2025)

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space
by: Shen, Zhenyi, et al.
Published: (2025)

BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization
by: Hagos, Desta Haileselassie, et al.
Published: (2025)

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
by: Li, Zheng, et al.
Published: (2025)

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
by: He, Mutian, et al.
Published: (2025)

MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
by: Li, Wenxuan, et al.
Published: (2025)

CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning
by: Lu, Zhiyuan, et al.
Published: (2026)

Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
by: Li, Chen, et al.
Published: (2025)

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
by: Zhu, Kan, et al.
Published: (2025)

From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
by: Zhu, Mingcheng, et al.
Published: (2026)

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)

AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control
by: Li, Ruosen, et al.
Published: (2025)

AdaSplash: Adaptive Sparse Flash Attention
by: Gonçalves, Nuno, et al.
Published: (2025)

Rectified Sparse Attention
by: Sun, Yutao, et al.
Published: (2025)

When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning
by: Xu, Ruotao, et al.
Published: (2026)

Trainable Dynamic Mask Sparse Attention
by: Shi, Jingze, et al.
Published: (2025)

Efficient Vision-Language Reasoning via Adaptive Token Pruning
by: Li, Xue, et al.
Published: (2025)

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving
by: Liu, Xunzhuo, et al.
Published: (2026)

Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference
by: Yan, Siyuan, et al.
Published: (2025)

Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
by: Li, Zihao, et al.
Published: (2025)

Token-weighted Direct Preference Optimization with Attention
by: Huang, Chengyu, et al.
Published: (2026)