Enregistré dans:
Détails bibliographiques
Auteurs principaux: Wu, Kuanting, Ota, Kei, Kanezaki, Asako
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2504.14535
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866913801063890944
author Wu, Kuanting
Ota, Kei
Kanezaki, Asako
author_facet Wu, Kuanting
Ota, Kei
Kanezaki, Asako
contents Video Diffusion Models (VDMs) can generate high-quality videos, but often struggle with producing temporally coherent motion. Optical flow supervision is a promising approach to address this, with prior works commonly employing warping-based strategies that avoid explicit flow matching. In this work, we explore an alternative formulation, FlowLoss, which directly compares flow fields extracted from generated and ground-truth videos. To account for the unreliability of flow estimation under high-noise conditions in diffusion, we propose a noise-aware weighting scheme that modulates the flow loss across denoising steps. Experiments on robotic video datasets suggest that FlowLoss improves motion stability and accelerates convergence in early training stages. Our findings offer practical insights for incorporating motion-based supervision into noise-conditioned generative models.
format Preprint
id arxiv_https___arxiv_org_abs_2504_14535
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
Wu, Kuanting
Ota, Kei
Kanezaki, Asako
Computer Vision and Pattern Recognition
Video Diffusion Models (VDMs) can generate high-quality videos, but often struggle with producing temporally coherent motion. Optical flow supervision is a promising approach to address this, with prior works commonly employing warping-based strategies that avoid explicit flow matching. In this work, we explore an alternative formulation, FlowLoss, which directly compares flow fields extracted from generated and ground-truth videos. To account for the unreliability of flow estimation under high-noise conditions in diffusion, we propose a noise-aware weighting scheme that modulates the flow loss across denoising steps. Experiments on robotic video datasets suggest that FlowLoss improves motion stability and accelerates convergence in early training stages. Our findings offer practical insights for incorporating motion-based supervision into noise-conditioned generative models.
title FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2504.14535