Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chavez, Victor Fonte, Esteves, Claudia, Hayet, Jean-Bernard
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.05402
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909580373524480
author	Chavez, Victor Fonte Esteves, Claudia Hayet, Jean-Bernard
author_facet	Chavez, Victor Fonte Esteves, Claudia Hayet, Jean-Bernard
contents	In this work, we propose a new diffusion-based method for video frame interpolation (VFI), in the context of traditional hand-made animation. We introduce three main contributions: The first is that we explicitly handle the interpolation time in our model, which we also re-estimate during the training process, to cope with the particularly large variations observed in the animation domain, compared to natural videos; The second is that we adapt and generalize a diffusion scheme called ResShift recently proposed in the super-resolution community to VFI, which allows us to perform a very low number of diffusion steps (in the order of 10) to produce our estimates; The third is that we leverage the stochastic nature of the diffusion process to provide a pixel-wise estimate of the uncertainty on the interpolated frame, which could be useful to anticipate where the model may be wrong. We provide extensive comparisons with respect to state-of-the-art models and show that our model outperforms these models on animation videos. Our code is available at https://github.com/VicFonch/Multi-Input-Resshift-Diffusion-VFI.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_05402
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Time-adaptive Video Frame Interpolation based on Residual Diffusion Chavez, Victor Fonte Esteves, Claudia Hayet, Jean-Bernard Computer Vision and Pattern Recognition In this work, we propose a new diffusion-based method for video frame interpolation (VFI), in the context of traditional hand-made animation. We introduce three main contributions: The first is that we explicitly handle the interpolation time in our model, which we also re-estimate during the training process, to cope with the particularly large variations observed in the animation domain, compared to natural videos; The second is that we adapt and generalize a diffusion scheme called ResShift recently proposed in the super-resolution community to VFI, which allows us to perform a very low number of diffusion steps (in the order of 10) to produce our estimates; The third is that we leverage the stochastic nature of the diffusion process to provide a pixel-wise estimate of the uncertainty on the interpolated frame, which could be useful to anticipate where the model may be wrong. We provide extensive comparisons with respect to state-of-the-art models and show that our model outperforms these models on animation videos. Our code is available at https://github.com/VicFonch/Multi-Input-Resshift-Diffusion-VFI.
title	Time-adaptive Video Frame Interpolation based on Residual Diffusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.05402

Similar Items