Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Alliegro, Antonio, Pistilli, Francesca, Tommasi, Tatiana, Averta, Giuseppe
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.06182
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916647045955584
author	Alliegro, Antonio Pistilli, Francesca Tommasi, Tatiana Averta, Giuseppe
author_facet	Alliegro, Antonio Pistilli, Francesca Tommasi, Tatiana Averta, Giuseppe
contents	Forecasting human-environment interactions in daily activities is challenging due to the high variability of human behavior. While predicting directly from videos is possible, it is limited by confounding factors like irrelevant objects or background noise that do not contribute to the interaction. A promising alternative is using Scene Graphs (SGs) to track only the relevant elements. However, current methods for forecasting future SGs face significant challenges and often rely on unrealistic assumptions, such as fixed objects over time, limiting their applicability to long-term activities where interacted objects may appear or disappear. In this paper, we introduce FORESCENE, a novel framework for Scene Graph Anticipation (SGA) that predicts both object and relationship evolution over time. FORESCENE encodes observed video segments into a latent representation using a tailored Graph Auto-Encoder and forecasts future SGs using a Latent Diffusion Model (LDM). Our approach enables continuous prediction of interaction dynamics without making assumptions on the graph's content or structure. We evaluate FORESCENE on the Action Genome dataset, where it outperforms existing SGA methods while solving a significantly more complex task.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_06182
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	FORESCENE: FOREcasting human activity via latent SCENE graphs diffusion Alliegro, Antonio Pistilli, Francesca Tommasi, Tatiana Averta, Giuseppe Computer Vision and Pattern Recognition Forecasting human-environment interactions in daily activities is challenging due to the high variability of human behavior. While predicting directly from videos is possible, it is limited by confounding factors like irrelevant objects or background noise that do not contribute to the interaction. A promising alternative is using Scene Graphs (SGs) to track only the relevant elements. However, current methods for forecasting future SGs face significant challenges and often rely on unrealistic assumptions, such as fixed objects over time, limiting their applicability to long-term activities where interacted objects may appear or disappear. In this paper, we introduce FORESCENE, a novel framework for Scene Graph Anticipation (SGA) that predicts both object and relationship evolution over time. FORESCENE encodes observed video segments into a latent representation using a tailored Graph Auto-Encoder and forecasts future SGs using a Latent Diffusion Model (LDM). Our approach enables continuous prediction of interaction dynamics without making assumptions on the graph's content or structure. We evaluate FORESCENE on the Action Genome dataset, where it outperforms existing SGA methods while solving a significantly more complex task.
title	FORESCENE: FOREcasting human activity via latent SCENE graphs diffusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2503.06182

Similar Items