Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cui, Fei, Fang, Jiaojiao, Wu, Xiaojiang, Lai, Zelong, Yang, Mengke, Jia, Menghan, Liu, Guizhong
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2404.11576
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866929317544460288
author	Cui, Fei Fang, Jiaojiao Wu, Xiaojiang Lai, Zelong Yang, Mengke Jia, Menghan Liu, Guizhong
author_facet	Cui, Fei Fang, Jiaojiao Wu, Xiaojiang Lai, Zelong Yang, Mengke Jia, Menghan Liu, Guizhong
contents	Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction, proves to be more efficient. However, inferring long-term temporal information about motion and generalizing to dynamic scenarios under non-stationary assumptions remains an unresolved challenge. In this paper, we propose a state-space decomposition stochastic video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and stochastic motion prediction. Through adaptive decomposition, the model's generalization capability to dynamic scenarios is enhanced. In the context of motion prediction, obtaining a prior on the long-term trend of future motion is crucial. Thus, in the stochastic motion prediction branch, we infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames. Experimental results demonstrate that our model outperforms baselines on multiple datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_11576
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend Cui, Fei Fang, Jiaojiao Wu, Xiaojiang Lai, Zelong Yang, Mengke Jia, Menghan Liu, Guizhong Computer Vision and Pattern Recognition Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction, proves to be more efficient. However, inferring long-term temporal information about motion and generalizing to dynamic scenarios under non-stationary assumptions remains an unresolved challenge. In this paper, we propose a state-space decomposition stochastic video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and stochastic motion prediction. Through adaptive decomposition, the model's generalization capability to dynamic scenarios is enhanced. In the context of motion prediction, obtaining a prior on the long-term trend of future motion is crucial. Thus, in the stochastic motion prediction branch, we infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames. Experimental results demonstrate that our model outperforms baselines on multiple datasets.
title	State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.11576

Ähnliche Einträge