Saved in:
Bibliographic Details
Main Authors: Fernandes, Jose Geraldo, Nascimento, Sinval, Dominguete, Daniel, Oliveira, André, Rotsen, Lucas, Souza, Gabriel, Brochero, David, Facury, Luiz, Vilela, Mateus, Costa, Hebert, Coelho, Frederico, Braga, Antônio P.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.17430
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909298811994112
author Fernandes, Jose Geraldo
Nascimento, Sinval
Dominguete, Daniel
Oliveira, André
Rotsen, Lucas
Souza, Gabriel
Brochero, David
Facury, Luiz
Vilela, Mateus
Costa, Hebert
Coelho, Frederico
Braga, Antônio P.
author_facet Fernandes, Jose Geraldo
Nascimento, Sinval
Dominguete, Daniel
Oliveira, André
Rotsen, Lucas
Souza, Gabriel
Brochero, David
Facury, Luiz
Vilela, Mateus
Costa, Hebert
Coelho, Frederico
Braga, Antônio P.
contents In many applications, synchronizing audio with visuals is crucial, such as in creating graphic animations for films or games, translating movie audio into different languages, and developing metaverse applications. This review explores various methodologies for achieving realistic facial animations from audio inputs, highlighting generative and adaptive models. Addressing challenges like model training costs, dataset availability, and silent moment distributions in audio data, it presents innovative solutions to enhance performance and realism. The research also introduces a new taxonomy to categorize audio-visual synchronization methods based on logistical aspects, advancing the capabilities of virtual assistants, gaming, and interactive digital media.
format Preprint
id arxiv_https___arxiv_org_abs_2407_17430
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation
Fernandes, Jose Geraldo
Nascimento, Sinval
Dominguete, Daniel
Oliveira, André
Rotsen, Lucas
Souza, Gabriel
Brochero, David
Facury, Luiz
Vilela, Mateus
Costa, Hebert
Coelho, Frederico
Braga, Antônio P.
Audio and Speech Processing
In many applications, synchronizing audio with visuals is crucial, such as in creating graphic animations for films or games, translating movie audio into different languages, and developing metaverse applications. This review explores various methodologies for achieving realistic facial animations from audio inputs, highlighting generative and adaptive models. Addressing challenges like model training costs, dataset availability, and silent moment distributions in audio data, it presents innovative solutions to enhance performance and realism. The research also introduces a new taxonomy to categorize audio-visual synchronization methods based on logistical aspects, advancing the capabilities of virtual assistants, gaming, and interactive digital media.
title A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation
topic Audio and Speech Processing
url https://arxiv.org/abs/2407.17430