Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Vilaça, Luís, Yu, Yi, Viana, Paula
Format: Preprint
Veröffentlicht: 2022
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2202.13673
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866908732960538624
author Vilaça, Luís
Yu, Yi
Viana, Paula
author_facet Vilaça, Luís
Yu, Yi
Viana, Paula
contents Audio-visual correlation learning aims to capture essential correspondences and understand natural phenomena between audio and video. With the rapid growth of deep learning, an increasing amount of attention has been paid to this emerging research issue. Through the past few years, various methods and datasets have been proposed for audio-visual correlation learning, which motivate us to conclude a comprehensive survey. This survey paper focuses on state-of-the-art (SOTA) models used to learn correlations between audio and video, but also discusses some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate some objective functions frequently used for optimizing audio-visual correlation learning models and discuss how audio-visual data is exploited in the optimization process. Most importantly, we provide an extensive comparison and summarization of the recent progress of SOTA audio-visual correlation learning and discuss future research directions.
format Preprint
id arxiv_https___arxiv_org_abs_2202_13673
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Vilaça, Luís
Yu, Yi
Viana, Paula
Multimedia
Computer Vision and Pattern Recognition
Information Retrieval
Machine Learning
Audio and Speech Processing
68T99
Audio-visual correlation learning aims to capture essential correspondences and understand natural phenomena between audio and video. With the rapid growth of deep learning, an increasing amount of attention has been paid to this emerging research issue. Through the past few years, various methods and datasets have been proposed for audio-visual correlation learning, which motivate us to conclude a comprehensive survey. This survey paper focuses on state-of-the-art (SOTA) models used to learn correlations between audio and video, but also discusses some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate some objective functions frequently used for optimizing audio-visual correlation learning models and discuss how audio-visual data is exploited in the optimization process. Most importantly, we provide an extensive comparison and summarization of the recent progress of SOTA audio-visual correlation learning and discuss future research directions.
title Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
topic Multimedia
Computer Vision and Pattern Recognition
Information Retrieval
Machine Learning
Audio and Speech Processing
68T99
url https://arxiv.org/abs/2202.13673