Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Cui, Fei, Fang, Jiaojiao, Wu, Xiaojiang, Lai, Zelong, Yang, Mengke, Jia, Menghan, Liu, Guizhong
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2404.11576
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866929317544460288
author Cui, Fei
Fang, Jiaojiao
Wu, Xiaojiang
Lai, Zelong
Yang, Mengke
Jia, Menghan
Liu, Guizhong
author_facet Cui, Fei
Fang, Jiaojiao
Wu, Xiaojiang
Lai, Zelong
Yang, Mengke
Jia, Menghan
Liu, Guizhong
contents Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction, proves to be more efficient. However, inferring long-term temporal information about motion and generalizing to dynamic scenarios under non-stationary assumptions remains an unresolved challenge. In this paper, we propose a state-space decomposition stochastic video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and stochastic motion prediction. Through adaptive decomposition, the model's generalization capability to dynamic scenarios is enhanced. In the context of motion prediction, obtaining a prior on the long-term trend of future motion is crucial. Thus, in the stochastic motion prediction branch, we infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames. Experimental results demonstrate that our model outperforms baselines on multiple datasets.
format Preprint
id arxiv_https___arxiv_org_abs_2404_11576
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend
Cui, Fei
Fang, Jiaojiao
Wu, Xiaojiang
Lai, Zelong
Yang, Mengke
Jia, Menghan
Liu, Guizhong
Computer Vision and Pattern Recognition
Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction, proves to be more efficient. However, inferring long-term temporal information about motion and generalizing to dynamic scenarios under non-stationary assumptions remains an unresolved challenge. In this paper, we propose a state-space decomposition stochastic video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and stochastic motion prediction. Through adaptive decomposition, the model's generalization capability to dynamic scenarios is enhanced. In the context of motion prediction, obtaining a prior on the long-term trend of future motion is crucial. Thus, in the stochastic motion prediction branch, we infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames. Experimental results demonstrate that our model outperforms baselines on multiple datasets.
title State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2404.11576