Saved in:
| Main Authors: | Shen, Xuan, Han, Chenxia, Zhou, Yufa, Xie, Yanyue, Gong, Yifan, Wang, Quanyi, Wang, Yiwei, Wang, Yanzhi, Zhao, Pu, Gu, Jiuxiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.14708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
by: Shen, Xuan, et al.
Published: (2025)
by: Shen, Xuan, et al.
Published: (2025)
Efficient Reasoning with Hidden Thinking
by: Shen, Xuan, et al.
Published: (2025)
by: Shen, Xuan, et al.
Published: (2025)
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
by: Shen, Xuan, et al.
Published: (2024)
by: Shen, Xuan, et al.
Published: (2024)
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
by: Shen, Xuan, et al.
Published: (2025)
by: Shen, Xuan, et al.
Published: (2025)
Squat: Quant Small Language Models on the Edge
by: Shen, Xuan, et al.
Published: (2024)
by: Shen, Xuan, et al.
Published: (2024)
HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression
by: Lu, Lei, et al.
Published: (2024)
by: Lu, Lei, et al.
Published: (2024)
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
by: Zhan, Zheng, et al.
Published: (2024)
by: Zhan, Zheng, et al.
Published: (2024)
OmniMem: Scalable and Adaptive Memory Retrieval for Long Video Generation
by: Zhao, Lin, et al.
Published: (2026)
by: Zhao, Lin, et al.
Published: (2026)
Numerical Pruning for Efficient Autoregressive Models
by: Shen, Xuan, et al.
Published: (2024)
by: Shen, Xuan, et al.
Published: (2024)
Collaborative Compression for Large-Scale MoE Deployment on Edge
by: Chen, Yixiao, et al.
Published: (2025)
by: Chen, Yixiao, et al.
Published: (2025)
Differentially Private Attention Computation
by: Gao, Yeqi, et al.
Published: (2023)
by: Gao, Yeqi, et al.
Published: (2023)
Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
by: Lv, Chengtao, et al.
Published: (2026)
by: Lv, Chengtao, et al.
Published: (2026)
Rethinking Token Reduction for State Space Models
by: Zhan, Zheng, et al.
Published: (2024)
by: Zhan, Zheng, et al.
Published: (2024)
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
by: Zhou, Xingyu, et al.
Published: (2024)
by: Zhou, Xingyu, et al.
Published: (2024)
Higher-order Linear Attention
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
GRA: Detecting Oriented Objects through Group-wise Rotating and Attention
by: Wang, Jiangshan, et al.
Published: (2024)
by: Wang, Jiangshan, et al.
Published: (2024)
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)
by: Xie, Yanyue, et al.
Published: (2024)
FastFace: Tuning Identity Preservation in Distilled Diffusion via Guidance and Attention
by: Karpukhin, Sergey, et al.
Published: (2025)
by: Karpukhin, Sergey, et al.
Published: (2025)
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
by: Li, Bingxuan, et al.
Published: (2025)
by: Li, Bingxuan, et al.
Published: (2025)
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer
by: Fang, Tongcheng, et al.
Published: (2026)
by: Fang, Tongcheng, et al.
Published: (2026)
Search for Efficient Large Language Models
by: Shen, Xuan, et al.
Published: (2024)
by: Shen, Xuan, et al.
Published: (2024)
FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs
by: Lin, Haoran, et al.
Published: (2024)
by: Lin, Haoran, et al.
Published: (2024)
Understanding and Improving Training-free Loss-based Diffusion Guidance
by: Shen, Yifei, et al.
Published: (2024)
by: Shen, Yifei, et al.
Published: (2024)
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models
by: Chen, Dar-Yen, et al.
Published: (2025)
by: Chen, Dar-Yen, et al.
Published: (2025)
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
by: Liang, Yingyu, et al.
Published: (2024)
by: Liang, Yingyu, et al.
Published: (2024)
Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
by: Sun, Bowen, et al.
Published: (2025)
by: Sun, Bowen, et al.
Published: (2025)
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)
by: Wang, Hankun, et al.
Published: (2024)
Fast Solve of Broadband Electromagnetic Scattering Problems Based on Krylov Subspace Basis Functions Combining With Compressive Sensing
by: Zhonggen Wang, et al.
Published: (2025)
by: Zhonggen Wang, et al.
Published: (2025)
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
by: Xu, Dejia, et al.
Published: (2024)
by: Xu, Dejia, et al.
Published: (2024)
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
by: Gong, Xuan, et al.
Published: (2024)
by: Gong, Xuan, et al.
Published: (2024)
HDCompression: Hybrid-Diffusion Image Compression for Ultra-Low Bitrates
by: Lu, Lei, et al.
Published: (2025)
by: Lu, Lei, et al.
Published: (2025)
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
by: Pu, Yifan, et al.
Published: (2024)
by: Pu, Yifan, et al.
Published: (2024)
Understanding Attention Mechanism in Video Diffusion Models
by: Liu, Bingyan, et al.
Published: (2025)
by: Liu, Bingyan, et al.
Published: (2025)
Exploring Token Pruning in Vision State Space Models
by: Zhan, Zheng, et al.
Published: (2024)
by: Zhan, Zheng, et al.
Published: (2024)
Attention Beats Linear for Fast Implicit Neural Representation Generation
by: Zhang, Shuyi, et al.
Published: (2024)
by: Zhang, Shuyi, et al.
Published: (2024)
Re-Attentional Controllable Video Diffusion Editing
by: Wang, Yuanzhi, et al.
Published: (2024)
by: Wang, Yuanzhi, et al.
Published: (2024)
Demystify Mamba in Vision: A Linear Attention Perspective
by: Han, Dongchen, et al.
Published: (2024)
by: Han, Dongchen, et al.
Published: (2024)
STDAN: Deformable Attention Network for Space-Time Video Super-Resolution
by: Wang, Hai, et al.
Published: (2022)
by: Wang, Hai, et al.
Published: (2022)
Fast Cross-Operator Optimization of Attention Dataflow
by: Chang, Haodong, et al.
Published: (2026)
by: Chang, Haodong, et al.
Published: (2026)
Pruning Foundation Models for High Accuracy without Retraining
by: Zhao, Pu, et al.
Published: (2024)
by: Zhao, Pu, et al.
Published: (2024)
Similar Items
-
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
by: Shen, Xuan, et al.
Published: (2025) -
Efficient Reasoning with Hidden Thinking
by: Shen, Xuan, et al.
Published: (2025) -
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
by: Shen, Xuan, et al.
Published: (2024) -
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
by: Shen, Xuan, et al.
Published: (2025) -
Squat: Quant Small Language Models on the Edge
by: Shen, Xuan, et al.
Published: (2024)