Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Vilaça, Luís, Yu, Yi, Viana, Paula
Format:	Preprint
Veröffentlicht:	2022
Schlagworte:	Multimedia Computer Vision and Pattern Recognition Information Retrieval Machine Learning Audio and Speech Processing 68T99
Online-Zugang:	https://arxiv.org/abs/2202.13673
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866908732960538624
author	Vilaça, Luís Yu, Yi Viana, Paula
author_facet	Vilaça, Luís Yu, Yi Viana, Paula
contents	Audio-visual correlation learning aims to capture essential correspondences and understand natural phenomena between audio and video. With the rapid growth of deep learning, an increasing amount of attention has been paid to this emerging research issue. Through the past few years, various methods and datasets have been proposed for audio-visual correlation learning, which motivate us to conclude a comprehensive survey. This survey paper focuses on state-of-the-art (SOTA) models used to learn correlations between audio and video, but also discusses some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate some objective functions frequently used for optimizing audio-visual correlation learning models and discuss how audio-visual data is exploited in the optimization process. Most importantly, we provide an extensive comparison and summarization of the recent progress of SOTA audio-visual correlation learning and discuss future research directions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2202_13673
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	Recent Advances and Challenges in Deep Audio-Visual Correlation Learning Vilaça, Luís Yu, Yi Viana, Paula Multimedia Computer Vision and Pattern Recognition Information Retrieval Machine Learning Audio and Speech Processing 68T99 Audio-visual correlation learning aims to capture essential correspondences and understand natural phenomena between audio and video. With the rapid growth of deep learning, an increasing amount of attention has been paid to this emerging research issue. Through the past few years, various methods and datasets have been proposed for audio-visual correlation learning, which motivate us to conclude a comprehensive survey. This survey paper focuses on state-of-the-art (SOTA) models used to learn correlations between audio and video, but also discusses some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate some objective functions frequently used for optimizing audio-visual correlation learning models and discuss how audio-visual data is exploited in the optimization process. Most importantly, we provide an extensive comparison and summarization of the recent progress of SOTA audio-visual correlation learning and discuss future research directions.
title	Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
topic	Multimedia Computer Vision and Pattern Recognition Information Retrieval Machine Learning Audio and Speech Processing 68T99
url	https://arxiv.org/abs/2202.13673

Ähnliche Einträge