Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Peng, Liang, Cheng, Haoran, Yang, Zheng, Zhao, Ruisi, Xia, Linxuan, Song, Chaotian, Lu, Qinglin, Wu, Boxi, Liu, Wei
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2311.17536
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910319361654784
author	Peng, Liang Cheng, Haoran Yang, Zheng Zhao, Ruisi Xia, Linxuan Song, Chaotian Lu, Qinglin Wu, Boxi Liu, Wei
author_facet	Peng, Liang Cheng, Haoran Yang, Zheng Zhao, Ruisi Xia, Linxuan Song, Chaotian Lu, Qinglin Wu, Boxi Liu, Wei
contents	Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video frames. This constraint aims to regulate noise predictions across their temporal neighbors, resulting in smooth latents. It can be simply included as a loss term during the training phase. By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos. Furthermore, we argue that current video evaluation metrics inadequately capture smoothness. To address this, we introduce a novel metric that considers detailed features and their temporal dynamics. Experimental results validate the effectiveness of our approach in producing smoother videos on various one-shot video tuning baselines. The source codes and video demos are available at \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2311_17536
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning Peng, Liang Cheng, Haoran Yang, Zheng Zhao, Ruisi Xia, Linxuan Song, Chaotian Lu, Qinglin Wu, Boxi Liu, Wei Computer Vision and Pattern Recognition Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video frames. This constraint aims to regulate noise predictions across their temporal neighbors, resulting in smooth latents. It can be simply included as a loss term during the training phase. By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos. Furthermore, we argue that current video evaluation metrics inadequately capture smoothness. To address this, we introduce a novel metric that considers detailed features and their temporal dynamics. Experimental results validate the effectiveness of our approach in producing smoother videos on various one-shot video tuning baselines. The source codes and video demos are available at \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo}.
title	SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2311.17536

Similar Items