Saved in:
Bibliographic Details
Main Authors: Zou, Chang, Li, Changlin, Li, Yang, Li, Patrol, Wu, Jianbing, He, Xiao, Liu, Songtao, Zhong, Zhao, Huang, Kailin, Zhang, Linfeng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.05449
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917421978222592
author Zou, Chang
Li, Changlin
Li, Yang
Li, Patrol
Wu, Jianbing
He, Xiao
Liu, Songtao
Zhong, Zhao
Huang, Kailin
Zhang, Linfeng
author_facet Zou, Chang
Li, Changlin
Li, Yang
Li, Patrol
Wu, Jianbing
He, Xiao
Liu, Songtao
Zhong, Zhao
Huang, Kailin
Zhang, Linfeng
contents While diffusion models have achieved great success in the field of video generation, this progress is accompanied by a rapidly escalating computational burden. Among the existing acceleration methods, Feature Caching is popular due to its training-free property and considerable speedup performance, but it inevitably faces semantic and detail drop with further compression. Another widely adopted method, training-aware step-distillation, though successful in image generation, also faces drastic degradation in video generation with a few steps. Furthermore, the quality loss becomes more severe when simply applying training-free feature caching to the step-distilled models, due to the sparser sampling steps. This paper novelly introduces a distillation-compatible learnable feature caching mechanism for the first time. We employ a lightweight learnable neural predictor instead of traditional training-free heuristics for diffusion models, enabling a more accurate capture of the high-dimensional feature evolution process. Furthermore, we explore the challenges of highly compressed distillation on large-scale video models and propose a conservative Restricted MeanFlow approach to achieve more stable and lossless distillation. By undertaking these initiatives, we further push the acceleration boundaries to $11.8\times$ while preserving generation quality. Extensive experiments demonstrate the effectiveness of our method. Code has been made publicly available: https://github.com/Tencent-Hunyuan/DisCa
format Preprint
id arxiv_https___arxiv_org_abs_2602_05449
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching
Zou, Chang
Li, Changlin
Li, Yang
Li, Patrol
Wu, Jianbing
He, Xiao
Liu, Songtao
Zhong, Zhao
Huang, Kailin
Zhang, Linfeng
Computer Vision and Pattern Recognition
Artificial Intelligence
While diffusion models have achieved great success in the field of video generation, this progress is accompanied by a rapidly escalating computational burden. Among the existing acceleration methods, Feature Caching is popular due to its training-free property and considerable speedup performance, but it inevitably faces semantic and detail drop with further compression. Another widely adopted method, training-aware step-distillation, though successful in image generation, also faces drastic degradation in video generation with a few steps. Furthermore, the quality loss becomes more severe when simply applying training-free feature caching to the step-distilled models, due to the sparser sampling steps. This paper novelly introduces a distillation-compatible learnable feature caching mechanism for the first time. We employ a lightweight learnable neural predictor instead of traditional training-free heuristics for diffusion models, enabling a more accurate capture of the high-dimensional feature evolution process. Furthermore, we explore the challenges of highly compressed distillation on large-scale video models and propose a conservative Restricted MeanFlow approach to achieve more stable and lossless distillation. By undertaking these initiatives, we further push the acceleration boundaries to $11.8\times$ while preserving generation quality. Extensive experiments demonstrate the effectiveness of our method. Code has been made publicly available: https://github.com/Tencent-Hunyuan/DisCa
title DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2602.05449