:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shen, Xuan, Han, Chenxia, Zhou, Yufa, Xie, Yanyue, Gong, Yifan, Wang, Quanyi, Wang, Yiwei, Wang, Yanzhi, Zhao, Pu, Gu, Jiuxiang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.14708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
by: Shen, Xuan, et al.
Published: (2025)

Efficient Reasoning with Hidden Thinking
by: Shen, Xuan, et al.
Published: (2025)

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
by: Shen, Xuan, et al.
Published: (2024)

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
by: Shen, Xuan, et al.
Published: (2025)

Squat: Quant Small Language Models on the Edge
by: Shen, Xuan, et al.
Published: (2024)

HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression
by: Lu, Lei, et al.
Published: (2024)

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
by: Zhan, Zheng, et al.
Published: (2024)

OmniMem: Scalable and Adaptive Memory Retrieval for Long Video Generation
by: Zhao, Lin, et al.
Published: (2026)

Numerical Pruning for Efficient Autoregressive Models
by: Shen, Xuan, et al.
Published: (2024)

Collaborative Compression for Large-Scale MoE Deployment on Edge
by: Chen, Yixiao, et al.
Published: (2025)

Differentially Private Attention Computation
by: Gao, Yeqi, et al.
Published: (2023)

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
by: Lv, Chengtao, et al.
Published: (2026)

Rethinking Token Reduction for State Space Models
by: Zhan, Zheng, et al.
Published: (2024)

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
by: Zhou, Xingyu, et al.
Published: (2024)

Higher-order Linear Attention
by: Zhang, Yifan, et al.
Published: (2025)

GRA: Detecting Oriented Objects through Group-wise Rotating and Attention
by: Wang, Jiangshan, et al.
Published: (2024)

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)

FastFace: Tuning Identity Preservation in Distilled Diffusion via Guidance and Attention
by: Karpukhin, Sergey, et al.
Published: (2025)

METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
by: Li, Bingxuan, et al.
Published: (2025)

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer
by: Fang, Tongcheng, et al.
Published: (2026)

Search for Efficient Large Language Models
by: Shen, Xuan, et al.
Published: (2024)

FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs
by: Lin, Haoran, et al.
Published: (2024)

Understanding and Improving Training-free Loss-based Diffusion Guidance
by: Shen, Yifei, et al.
Published: (2024)

Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models
by: Chen, Dar-Yen, et al.
Published: (2025)

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
by: Liang, Yingyu, et al.
Published: (2024)

Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
by: Sun, Bowen, et al.
Published: (2025)

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)

Fast Solve of Broadband Electromagnetic Scattering Problems Based on Krylov Subspace Basis Functions Combining With Compressive Sensing
by: Zhonggen Wang, et al.
Published: (2025)

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
by: Xu, Dejia, et al.
Published: (2024)

DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
by: Gong, Xuan, et al.
Published: (2024)

HDCompression: Hybrid-Diffusion Image Compression for Ultra-Low Bitrates
by: Lu, Lei, et al.
Published: (2025)

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
by: Pu, Yifan, et al.
Published: (2024)

Understanding Attention Mechanism in Video Diffusion Models
by: Liu, Bingyan, et al.
Published: (2025)

Exploring Token Pruning in Vision State Space Models
by: Zhan, Zheng, et al.
Published: (2024)

Attention Beats Linear for Fast Implicit Neural Representation Generation
by: Zhang, Shuyi, et al.
Published: (2024)

Re-Attentional Controllable Video Diffusion Editing
by: Wang, Yuanzhi, et al.
Published: (2024)

Demystify Mamba in Vision: A Linear Attention Perspective
by: Han, Dongchen, et al.
Published: (2024)

STDAN: Deformable Attention Network for Space-Time Video Super-Resolution
by: Wang, Hai, et al.
Published: (2022)

Fast Cross-Operator Optimization of Attention Dataflow
by: Chang, Haodong, et al.
Published: (2026)

Pruning Foundation Models for High Accuracy without Retraining
by: Zhao, Pu, et al.
Published: (2024)