Saved in:
Bibliographic Details
Main Authors: Peng, Liang, Cheng, Haoran, Yang, Zheng, Zhao, Ruisi, Xia, Linxuan, Song, Chaotian, Lu, Qinglin, Wu, Boxi, Liu, Wei
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2311.17536
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910319361654784
author Peng, Liang
Cheng, Haoran
Yang, Zheng
Zhao, Ruisi
Xia, Linxuan
Song, Chaotian
Lu, Qinglin
Wu, Boxi
Liu, Wei
author_facet Peng, Liang
Cheng, Haoran
Yang, Zheng
Zhao, Ruisi
Xia, Linxuan
Song, Chaotian
Lu, Qinglin
Wu, Boxi
Liu, Wei
contents Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video frames. This constraint aims to regulate noise predictions across their temporal neighbors, resulting in smooth latents. It can be simply included as a loss term during the training phase. By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos. Furthermore, we argue that current video evaluation metrics inadequately capture smoothness. To address this, we introduce a novel metric that considers detailed features and their temporal dynamics. Experimental results validate the effectiveness of our approach in producing smoother videos on various one-shot video tuning baselines. The source codes and video demos are available at \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo}.
format Preprint
id arxiv_https___arxiv_org_abs_2311_17536
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
Peng, Liang
Cheng, Haoran
Yang, Zheng
Zhao, Ruisi
Xia, Linxuan
Song, Chaotian
Lu, Qinglin
Wu, Boxi
Liu, Wei
Computer Vision and Pattern Recognition
Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video frames. This constraint aims to regulate noise predictions across their temporal neighbors, resulting in smooth latents. It can be simply included as a loss term during the training phase. By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos. Furthermore, we argue that current video evaluation metrics inadequately capture smoothness. To address this, we introduce a novel metric that considers detailed features and their temporal dynamics. Experimental results validate the effectiveness of our approach in producing smoother videos on various one-shot video tuning baselines. The source codes and video demos are available at \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo}.
title SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2311.17536