MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Quattrocchi, Camillo, Furnari, Antonino, Di Mauro, Daniele, Giuffrida, Mario Valerio, Farinella, Giovanni Maria
Natura:	Preprint
Pubblicazione:	2023
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2312.02638
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911956345028608
author	Quattrocchi, Camillo Furnari, Antonino Di Mauro, Daniele Giuffrida, Mario Valerio Farinella, Giovanni Maria
author_facet	Quattrocchi, Camillo Furnari, Antonino Di Mauro, Daniele Giuffrida, Mario Valerio Farinella, Giovanni Maria
contents	We consider the problem of transferring a temporal action segmentation system initially designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras capture video data. The conventional supervised approach requires the collection and labeling of a new set of egocentric videos to adapt the model, which is costly and time-consuming. Instead, we propose a novel methodology which performs the adaptation leveraging existing labeled exocentric videos and a new set of unlabeled, synchronized exocentric-egocentric video pairs, for which temporal action segmentation annotations do not need to be collected. We implement the proposed methodology with an approach based on knowledge distillation, which we investigate both at the feature and Temporal Action Segmentation model level. Experiments on Assembly101 and EgoExo4D demonstrate the effectiveness of the proposed method against classic unsupervised domain adaptation and temporal alignment approaches. Without bells and whistles, our best model performs on par with supervised approaches trained on labeled egocentric data, without ever seeing a single egocentric label, achieving a +15.99 improvement in the edit score (28.59 vs 12.60) on the Assembly101 dataset compared to a baseline model trained solely on exocentric data. In similar settings, our method also improves edit score by +3.32 on the challenging EgoExo4D benchmark. Code is available here: https://github.com/fpv-iplab/synchronization-is-all-you-need.
format	Preprint
id	arxiv_https___arxiv_org_abs_2312_02638
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs Quattrocchi, Camillo Furnari, Antonino Di Mauro, Daniele Giuffrida, Mario Valerio Farinella, Giovanni Maria Computer Vision and Pattern Recognition We consider the problem of transferring a temporal action segmentation system initially designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras capture video data. The conventional supervised approach requires the collection and labeling of a new set of egocentric videos to adapt the model, which is costly and time-consuming. Instead, we propose a novel methodology which performs the adaptation leveraging existing labeled exocentric videos and a new set of unlabeled, synchronized exocentric-egocentric video pairs, for which temporal action segmentation annotations do not need to be collected. We implement the proposed methodology with an approach based on knowledge distillation, which we investigate both at the feature and Temporal Action Segmentation model level. Experiments on Assembly101 and EgoExo4D demonstrate the effectiveness of the proposed method against classic unsupervised domain adaptation and temporal alignment approaches. Without bells and whistles, our best model performs on par with supervised approaches trained on labeled egocentric data, without ever seeing a single egocentric label, achieving a +15.99 improvement in the edit score (28.59 vs 12.60) on the Assembly101 dataset compared to a baseline model trained solely on exocentric data. In similar settings, our method also improves edit score by +3.32 on the challenging EgoExo4D benchmark. Code is available here: https://github.com/fpv-iplab/synchronization-is-all-you-need.
title	Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2312.02638

Documenti analoghi