Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Yunpeng, Liu, Boxiao, Zhang, Yi, Hou, Xingzhong, Song, Guanglu, Liu, Yu, You, Haihang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.06295
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917862845710336
author	Liu, Yunpeng Liu, Boxiao Zhang, Yi Hou, Xingzhong Song, Guanglu Liu, Yu You, Haihang
author_facet	Liu, Yunpeng Liu, Boxiao Zhang, Yi Hou, Xingzhong Song, Guanglu Liu, Yu You, Haihang
contents	Significant advances have been made in the sampling efficiency of diffusion models and flow matching models, driven by Consistency Distillation (CD), which trains a student model to mimic the output of a teacher model at a later timestep. However, we found that the learning complexity of the student model varies significantly across different timesteps, leading to suboptimal performance in CD.To address this issue, we propose the Curriculum Consistency Model (CCM), which stabilizes and balances the learning complexity across timesteps. Specifically, we regard the distillation process at each timestep as a curriculum and introduce a metric based on Peak Signal-to-Noise Ratio (PSNR) to quantify the learning complexity of this curriculum, then ensure that the curriculum maintains consistent learning complexity across different timesteps by having the teacher model iterate more steps when the noise intensity is low. Our method achieves competitive single-step sampling Fréchet Inception Distance (FID) scores of 1.64 on CIFAR-10 and 2.18 on ImageNet 64x64.Moreover, we have extended our method to large-scale text-to-image models and confirmed that it generalizes well to both diffusion models (Stable Diffusion XL) and flow matching models (Stable Diffusion 3). The generated samples demonstrate improved image-text alignment and semantic structure, since CCM enlarges the distillation step at large timesteps and reduces the accumulated error.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_06295
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	See Further When Clear: Curriculum Consistency Model Liu, Yunpeng Liu, Boxiao Zhang, Yi Hou, Xingzhong Song, Guanglu Liu, Yu You, Haihang Computer Vision and Pattern Recognition Significant advances have been made in the sampling efficiency of diffusion models and flow matching models, driven by Consistency Distillation (CD), which trains a student model to mimic the output of a teacher model at a later timestep. However, we found that the learning complexity of the student model varies significantly across different timesteps, leading to suboptimal performance in CD.To address this issue, we propose the Curriculum Consistency Model (CCM), which stabilizes and balances the learning complexity across timesteps. Specifically, we regard the distillation process at each timestep as a curriculum and introduce a metric based on Peak Signal-to-Noise Ratio (PSNR) to quantify the learning complexity of this curriculum, then ensure that the curriculum maintains consistent learning complexity across different timesteps by having the teacher model iterate more steps when the noise intensity is low. Our method achieves competitive single-step sampling Fréchet Inception Distance (FID) scores of 1.64 on CIFAR-10 and 2.18 on ImageNet 64x64.Moreover, we have extended our method to large-scale text-to-image models and confirmed that it generalizes well to both diffusion models (Stable Diffusion XL) and flow matching models (Stable Diffusion 3). The generated samples demonstrate improved image-text alignment and semantic structure, since CCM enlarges the distillation step at large timesteps and reduces the accumulated error.
title	See Further When Clear: Curriculum Consistency Model
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2412.06295

Similar Items