Saved in:
| Main Authors: | Zhang, Chaoran, Zou, Lixin, Luo, Dan, Tang, Min, Luo, Xiangyang, Li, Zihao, Li, Chenliang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.02328 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation
by: Sun, Yang, et al.
Published: (2025)
by: Sun, Yang, et al.
Published: (2025)
Flow Matching based Sequential Recommender Model
by: Liu, Feng, et al.
Published: (2025)
by: Liu, Feng, et al.
Published: (2025)
HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026)
by: Gao, Yizhao, et al.
Published: (2026)
STS: Efficient Sparse Attention with Speculative Token Sparsity
by: Xu, Ceyu, et al.
Published: (2026)
by: Xu, Ceyu, et al.
Published: (2026)
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
by: Jo, Dongwon, et al.
Published: (2026)
by: Jo, Dongwon, et al.
Published: (2026)
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
by: Tang, Zecheng, et al.
Published: (2026)
by: Tang, Zecheng, et al.
Published: (2026)
AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation
by: Luo, Lvzhou, et al.
Published: (2025)
by: Luo, Lvzhou, et al.
Published: (2025)
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
by: Wang, Hanrui, et al.
Published: (2020)
by: Wang, Hanrui, et al.
Published: (2020)
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
by: Wang, Zekun, et al.
Published: (2023)
by: Wang, Zekun, et al.
Published: (2023)
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
by: Luo, Chuwei, et al.
Published: (2022)
by: Luo, Chuwei, et al.
Published: (2022)
Sparser Block-Sparse Attention via Token Permutation
by: Wang, Xinghao, et al.
Published: (2025)
by: Wang, Xinghao, et al.
Published: (2025)
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
by: Chen, Yu, et al.
Published: (2026)
by: Chen, Yu, et al.
Published: (2026)
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)
by: Huang, Yuxiang, et al.
Published: (2026)
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
by: Xiao, Emily, et al.
Published: (2025)
by: Xiao, Emily, et al.
Published: (2025)
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
by: Lai, Xunhao, et al.
Published: (2025)
by: Lai, Xunhao, et al.
Published: (2025)
Model Unlearning via Sparse Autoencoder Subspace Guided Projections
by: Wang, Xu, et al.
Published: (2025)
by: Wang, Xu, et al.
Published: (2025)
AsyncSpade: Efficient Test-Time Scaling with Asynchronous Sparse Decoding
by: Luo, Shuqing, et al.
Published: (2025)
by: Luo, Shuqing, et al.
Published: (2025)
FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination
by: Zhou, Pengfei, et al.
Published: (2024)
by: Zhou, Pengfei, et al.
Published: (2024)
Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation
by: Li, Shiwei, et al.
Published: (2025)
by: Li, Shiwei, et al.
Published: (2025)
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space
by: Shen, Zhenyi, et al.
Published: (2025)
by: Shen, Zhenyi, et al.
Published: (2025)
BiSparse-AAS: Bilinear Sparse Attention and Adaptive Spans Framework for Scalable and Efficient Text Summarization
by: Hagos, Desta Haileselassie, et al.
Published: (2025)
by: Hagos, Desta Haileselassie, et al.
Published: (2025)
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
by: Li, Zheng, et al.
Published: (2025)
by: Li, Zheng, et al.
Published: (2025)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)
by: Yuan, Jingyang, et al.
Published: (2025)
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
by: He, Mutian, et al.
Published: (2025)
by: He, Mutian, et al.
Published: (2025)
MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
by: Li, Wenxuan, et al.
Published: (2025)
by: Li, Wenxuan, et al.
Published: (2025)
CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning
by: Lu, Zhiyuan, et al.
Published: (2026)
by: Lu, Zhiyuan, et al.
Published: (2026)
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
by: Li, Chen, et al.
Published: (2025)
by: Li, Chen, et al.
Published: (2025)
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
by: Zhu, Kan, et al.
Published: (2025)
by: Zhu, Kan, et al.
Published: (2025)
From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
by: Zhu, Mingcheng, et al.
Published: (2026)
by: Zhu, Mingcheng, et al.
Published: (2026)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control
by: Li, Ruosen, et al.
Published: (2025)
by: Li, Ruosen, et al.
Published: (2025)
AdaSplash: Adaptive Sparse Flash Attention
by: Gonçalves, Nuno, et al.
Published: (2025)
by: Gonçalves, Nuno, et al.
Published: (2025)
Rectified Sparse Attention
by: Sun, Yutao, et al.
Published: (2025)
by: Sun, Yutao, et al.
Published: (2025)
When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning
by: Xu, Ruotao, et al.
Published: (2026)
by: Xu, Ruotao, et al.
Published: (2026)
Trainable Dynamic Mask Sparse Attention
by: Shi, Jingze, et al.
Published: (2025)
by: Shi, Jingze, et al.
Published: (2025)
Efficient Vision-Language Reasoning via Adaptive Token Pruning
by: Li, Xue, et al.
Published: (2025)
by: Li, Xue, et al.
Published: (2025)
Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference
by: Yan, Siyuan, et al.
Published: (2025)
by: Yan, Siyuan, et al.
Published: (2025)
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
by: Li, Zihao, et al.
Published: (2025)
by: Li, Zihao, et al.
Published: (2025)
Token-weighted Direct Preference Optimization with Attention
by: Huang, Chengyu, et al.
Published: (2026)
by: Huang, Chengyu, et al.
Published: (2026)
Similar Items
-
LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation
by: Sun, Yang, et al.
Published: (2025) -
Flow Matching based Sequential Recommender Model
by: Liu, Feng, et al.
Published: (2025) -
HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026) -
STS: Efficient Sparse Attention with Speculative Token Sparsity
by: Xu, Ceyu, et al.
Published: (2026) -
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
by: Jo, Dongwon, et al.
Published: (2026)