Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Wu, Kuanting, Ota, Kei, Kanezaki, Asako
Format:	Preprint
Publié:	2025
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2504.14535
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866913801063890944
author	Wu, Kuanting Ota, Kei Kanezaki, Asako
author_facet	Wu, Kuanting Ota, Kei Kanezaki, Asako
contents	Video Diffusion Models (VDMs) can generate high-quality videos, but often struggle with producing temporally coherent motion. Optical flow supervision is a promising approach to address this, with prior works commonly employing warping-based strategies that avoid explicit flow matching. In this work, we explore an alternative formulation, FlowLoss, which directly compares flow fields extracted from generated and ground-truth videos. To account for the unreliability of flow estimation under high-noise conditions in diffusion, we propose a noise-aware weighting scheme that modulates the flow loss across denoising steps. Experiments on robotic video datasets suggest that FlowLoss improves motion stability and accelerates convergence in early training stages. Our findings offer practical insights for incorporating motion-based supervision into noise-conditioned generative models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_14535
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models Wu, Kuanting Ota, Kei Kanezaki, Asako Computer Vision and Pattern Recognition Video Diffusion Models (VDMs) can generate high-quality videos, but often struggle with producing temporally coherent motion. Optical flow supervision is a promising approach to address this, with prior works commonly employing warping-based strategies that avoid explicit flow matching. In this work, we explore an alternative formulation, FlowLoss, which directly compares flow fields extracted from generated and ground-truth videos. To account for the unreliability of flow estimation under high-noise conditions in diffusion, we propose a noise-aware weighting scheme that modulates the flow loss across denoising steps. Experiments on robotic video datasets suggest that FlowLoss improves motion stability and accelerates convergence in early training stages. Our findings offer practical insights for incorporating motion-based supervision into noise-conditioned generative models.
title	FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.14535

Documents similaires