Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zou, Chang, Li, Changlin, Li, Yang, Li, Patrol, Wu, Jianbing, He, Xiao, Liu, Songtao, Zhong, Zhao, Huang, Kailin, Zhang, Linfeng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.05449
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917421978222592
author	Zou, Chang Li, Changlin Li, Yang Li, Patrol Wu, Jianbing He, Xiao Liu, Songtao Zhong, Zhao Huang, Kailin Zhang, Linfeng
author_facet	Zou, Chang Li, Changlin Li, Yang Li, Patrol Wu, Jianbing He, Xiao Liu, Songtao Zhong, Zhao Huang, Kailin Zhang, Linfeng
contents	While diffusion models have achieved great success in the field of video generation, this progress is accompanied by a rapidly escalating computational burden. Among the existing acceleration methods, Feature Caching is popular due to its training-free property and considerable speedup performance, but it inevitably faces semantic and detail drop with further compression. Another widely adopted method, training-aware step-distillation, though successful in image generation, also faces drastic degradation in video generation with a few steps. Furthermore, the quality loss becomes more severe when simply applying training-free feature caching to the step-distilled models, due to the sparser sampling steps. This paper novelly introduces a distillation-compatible learnable feature caching mechanism for the first time. We employ a lightweight learnable neural predictor instead of traditional training-free heuristics for diffusion models, enabling a more accurate capture of the high-dimensional feature evolution process. Furthermore, we explore the challenges of highly compressed distillation on large-scale video models and propose a conservative Restricted MeanFlow approach to achieve more stable and lossless distillation. By undertaking these initiatives, we further push the acceleration boundaries to $11.8\times$ while preserving generation quality. Extensive experiments demonstrate the effectiveness of our method. Code has been made publicly available: https://github.com/Tencent-Hunyuan/DisCa
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_05449
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching Zou, Chang Li, Changlin Li, Yang Li, Patrol Wu, Jianbing He, Xiao Liu, Songtao Zhong, Zhao Huang, Kailin Zhang, Linfeng Computer Vision and Pattern Recognition Artificial Intelligence While diffusion models have achieved great success in the field of video generation, this progress is accompanied by a rapidly escalating computational burden. Among the existing acceleration methods, Feature Caching is popular due to its training-free property and considerable speedup performance, but it inevitably faces semantic and detail drop with further compression. Another widely adopted method, training-aware step-distillation, though successful in image generation, also faces drastic degradation in video generation with a few steps. Furthermore, the quality loss becomes more severe when simply applying training-free feature caching to the step-distilled models, due to the sparser sampling steps. This paper novelly introduces a distillation-compatible learnable feature caching mechanism for the first time. We employ a lightweight learnable neural predictor instead of traditional training-free heuristics for diffusion models, enabling a more accurate capture of the high-dimensional feature evolution process. Furthermore, we explore the challenges of highly compressed distillation on large-scale video models and propose a conservative Restricted MeanFlow approach to achieve more stable and lossless distillation. By undertaking these initiatives, we further push the acceleration boundaries to $11.8\times$ while preserving generation quality. Extensive experiments demonstrate the effectiveness of our method. Code has been made publicly available: https://github.com/Tencent-Hunyuan/DisCa
title	DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2602.05449

Similar Items