Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.05402 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909580373524480 |
|---|---|
| author | Chavez, Victor Fonte Esteves, Claudia Hayet, Jean-Bernard |
| author_facet | Chavez, Victor Fonte Esteves, Claudia Hayet, Jean-Bernard |
| contents | In this work, we propose a new diffusion-based method for video frame interpolation (VFI), in the context of traditional hand-made animation. We introduce three main contributions: The first is that we explicitly handle the interpolation time in our model, which we also re-estimate during the training process, to cope with the particularly large variations observed in the animation domain, compared to natural videos; The second is that we adapt and generalize a diffusion scheme called ResShift recently proposed in the super-resolution community to VFI, which allows us to perform a very low number of diffusion steps (in the order of 10) to produce our estimates; The third is that we leverage the stochastic nature of the diffusion process to provide a pixel-wise estimate of the uncertainty on the interpolated frame, which could be useful to anticipate where the model may be wrong. We provide extensive comparisons with respect to state-of-the-art models and show that our model outperforms these models on animation videos. Our code is available at https://github.com/VicFonch/Multi-Input-Resshift-Diffusion-VFI. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_05402 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Time-adaptive Video Frame Interpolation based on Residual Diffusion Chavez, Victor Fonte Esteves, Claudia Hayet, Jean-Bernard Computer Vision and Pattern Recognition In this work, we propose a new diffusion-based method for video frame interpolation (VFI), in the context of traditional hand-made animation. We introduce three main contributions: The first is that we explicitly handle the interpolation time in our model, which we also re-estimate during the training process, to cope with the particularly large variations observed in the animation domain, compared to natural videos; The second is that we adapt and generalize a diffusion scheme called ResShift recently proposed in the super-resolution community to VFI, which allows us to perform a very low number of diffusion steps (in the order of 10) to produce our estimates; The third is that we leverage the stochastic nature of the diffusion process to provide a pixel-wise estimate of the uncertainty on the interpolated frame, which could be useful to anticipate where the model may be wrong. We provide extensive comparisons with respect to state-of-the-art models and show that our model outperforms these models on animation videos. Our code is available at https://github.com/VicFonch/Multi-Input-Resshift-Diffusion-VFI. |
| title | Time-adaptive Video Frame Interpolation based on Residual Diffusion |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2504.05402 |