Saved in:
Bibliographic Details
Main Authors: Adiya, Tserendorj, Yoon, Jae Shin, Lee, Jungeun, Kim, Sanghun, Lim, Hwasup
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2307.00574
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909145685295104
author Adiya, Tserendorj
Yoon, Jae Shin
Lee, Jungeun
Kim, Sanghun
Lim, Hwasup
author_facet Adiya, Tserendorj
Yoon, Jae Shin
Lee, Jungeun
Kim, Sanghun
Lim, Hwasup
contents We introduce a method to generate temporally coherent human animation from a single image, a video, or a random noise. This problem has been formulated as modeling of an auto-regressive generation, i.e., to regress past frames to decode future frames. However, such unidirectional generation is highly prone to motion drifting over time, generating unrealistic human animation with significant artifacts such as appearance distortion. We claim that bidirectional temporal modeling enforces temporal coherence on a generative network by largely suppressing the motion ambiguity of human appearance. To prove our claim, we design a novel human animation framework using a denoising diffusion model: a neural network learns to generate the image of a person by denoising temporal Gaussian noises whose intermediate results are cross-conditioned bidirectionally between consecutive frames. In the experiments, our method demonstrates strong performance compared to existing unidirectional approaches with realistic temporal coherence.
format Preprint
id arxiv_https___arxiv_org_abs_2307_00574
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation
Adiya, Tserendorj
Yoon, Jae Shin
Lee, Jungeun
Kim, Sanghun
Lim, Hwasup
Computer Vision and Pattern Recognition
We introduce a method to generate temporally coherent human animation from a single image, a video, or a random noise. This problem has been formulated as modeling of an auto-regressive generation, i.e., to regress past frames to decode future frames. However, such unidirectional generation is highly prone to motion drifting over time, generating unrealistic human animation with significant artifacts such as appearance distortion. We claim that bidirectional temporal modeling enforces temporal coherence on a generative network by largely suppressing the motion ambiguity of human appearance. To prove our claim, we design a novel human animation framework using a denoising diffusion model: a neural network learns to generate the image of a person by denoising temporal Gaussian noises whose intermediate results are cross-conditioned bidirectionally between consecutive frames. In the experiments, our method demonstrates strong performance compared to existing unidirectional approaches with realistic temporal coherence.
title Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2307.00574