Saved in:
Bibliographic Details
Main Authors: Zhu, Jinchao, Wang, Yuxuan, Pan, Siyuan, Wan, Pengfei, Zhang, Di, Huang, Gao
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.00210
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911919889186816
author Zhu, Jinchao
Wang, Yuxuan
Pan, Siyuan
Wan, Pengfei
Zhang, Di
Huang, Gao
author_facet Zhu, Jinchao
Wang, Yuxuan
Pan, Siyuan
Wan, Pengfei
Zhang, Di
Huang, Gao
contents The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these approaches typically maintain the original network architecture. The extensive parameter scale and substantial computational demands have limited research into adjusting the model architecture. This study focuses on reducing redundant computation in SDM and optimizes the model through both tuning and tuning-free methods. 1) For the tuning method, we design a model assembly strategy to reconstruct a lightweight model while preserving performance through distillation. Second, to mitigate performance loss due to pruning, we incorporate multi-expert conditional convolution (ME-CondConv) into compressed UNets to enhance network performance by increasing capacity without sacrificing speed. Third, we validate the effectiveness of the multi-UNet switching method for improving network speed. 2) For the tuning-free method, we propose a feature inheritance strategy to accelerate inference by skipping local computations at the block, layer, or unit level within the network structure. We also examine multiple sampling modes for feature inheritance at the time-step level. Experiments demonstrate that both the proposed tuning and the tuning-free methods can improve the speed and performance of the SDM. The lightweight model reconstructed by the model assembly strategy increases generation speed by $22.4%$, while the feature inheritance strategy enhances the SDM generation speed by $40.0%$.
format Preprint
id arxiv_https___arxiv_org_abs_2406_00210
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies
Zhu, Jinchao
Wang, Yuxuan
Pan, Siyuan
Wan, Pengfei
Zhang, Di
Huang, Gao
Computer Vision and Pattern Recognition
The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these approaches typically maintain the original network architecture. The extensive parameter scale and substantial computational demands have limited research into adjusting the model architecture. This study focuses on reducing redundant computation in SDM and optimizes the model through both tuning and tuning-free methods. 1) For the tuning method, we design a model assembly strategy to reconstruct a lightweight model while preserving performance through distillation. Second, to mitigate performance loss due to pruning, we incorporate multi-expert conditional convolution (ME-CondConv) into compressed UNets to enhance network performance by increasing capacity without sacrificing speed. Third, we validate the effectiveness of the multi-UNet switching method for improving network speed. 2) For the tuning-free method, we propose a feature inheritance strategy to accelerate inference by skipping local computations at the block, layer, or unit level within the network structure. We also examine multiple sampling modes for feature inheritance at the time-step level. Experiments demonstrate that both the proposed tuning and the tuning-free methods can improve the speed and performance of the SDM. The lightweight model reconstructed by the model assembly strategy increases generation speed by $22.4%$, while the feature inheritance strategy enhances the SDM generation speed by $40.0%$.
title A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2406.00210