Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Jin, Derong, Gao, Ruohan
Format:	Preprint
Publié:	2025
Sujets:	Computer Vision and Pattern Recognition Sound
Accès en ligne:	https://arxiv.org/abs/2504.21847
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866915447861936128
author	Jin, Derong Gao, Ruohan
author_facet	Jin, Derong Gao, Ruohan
contents	An immersive acoustic experience enabled by spatial audio is just as crucial as the visual aspect in creating realistic virtual environments. However, existing methods for room impulse response estimation rely either on data-demanding learning-based models or computationally expensive physics-based modeling. In this work, we introduce Audio-Visual Differentiable Room Acoustic Rendering (AV-DAR), a framework that leverages visual cues extracted from multi-view images and acoustic beam tracing for physics-based room acoustic rendering. Experiments across six real-world environments from two datasets demonstrate that our multimodal, physics-based approach is efficient, interpretable, and accurate, significantly outperforming a series of prior methods. Notably, on the Real Acoustic Field dataset, AV-DAR achieves comparable performance to models trained on 10 times more data while delivering relative gains ranging from 16.6% to 50.9% when trained at the same scale.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_21847
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Differentiable Room Acoustic Rendering with Multi-View Vision Priors Jin, Derong Gao, Ruohan Computer Vision and Pattern Recognition Sound An immersive acoustic experience enabled by spatial audio is just as crucial as the visual aspect in creating realistic virtual environments. However, existing methods for room impulse response estimation rely either on data-demanding learning-based models or computationally expensive physics-based modeling. In this work, we introduce Audio-Visual Differentiable Room Acoustic Rendering (AV-DAR), a framework that leverages visual cues extracted from multi-view images and acoustic beam tracing for physics-based room acoustic rendering. Experiments across six real-world environments from two datasets demonstrate that our multimodal, physics-based approach is efficient, interpretable, and accurate, significantly outperforming a series of prior methods. Notably, on the Real Acoustic Field dataset, AV-DAR achieves comparable performance to models trained on 10 times more data while delivering relative gains ranging from 16.6% to 50.9% when trained at the same scale.
title	Differentiable Room Acoustic Rendering with Multi-View Vision Priors
topic	Computer Vision and Pattern Recognition Sound
url	https://arxiv.org/abs/2504.21847

Documents similaires