:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Airale, Louis, Pajot, Adrien, Linossier, Juliette
Format:	Preprint
Published:	2024
Subjects:	Sound Computer Vision and Pattern Recognition Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2412.03633
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
by: Airale, Louis, et al.
Published: (2023)

Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data
by: Doell, Michael, et al.
Published: (2024)

SoundCam: A Dataset for Finding Humans Using Room Acoustics
by: Wang, Mason, et al.
Published: (2023)

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
by: Chen, Ziyang, et al.
Published: (2024)

Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
by: Zhang, Zhedong, et al.
Published: (2025)

Improving Acoustic Scene Classification with City Features
by: Cai, Yiqiang, et al.
Published: (2025)

SOAF: Scene Occlusion-aware Neural Acoustic Field
by: Gao, Huiyu, et al.
Published: (2024)

Few-shot Acoustic Synthesis with Multimodal Flow Matching
by: Brunetto, Amandine
Published: (2026)

Modeling and Driving Human Body Soundfields through Acoustic Primitives
by: Huang, Chao, et al.
Published: (2024)

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms
by: Ahn, Byeongjoo, et al.
Published: (2023)

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
by: Brunetto, Amandine, et al.
Published: (2024)

Decoding Emotions: Unveiling Facial Expressions through Acoustic Sensing with Contrastive Attention
by: Wang, Guangjing, et al.
Published: (2024)

How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
by: Saad, Mahnoor Fatima, et al.
Published: (2025)

RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling
by: Pham, Long-Khanh, et al.
Published: (2025)

Sonicmesh: Enhancing 3D Human Mesh Reconstruction in Vision-Impaired Environments With Acoustic Signals
by: Liang, Xiaoxuan, et al.
Published: (2024)

Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition
by: Li, Zeyu, et al.
Published: (2024)

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
by: Sun, Peiwen, et al.
Published: (2024)

ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling
by: Somayazulu, Arjun, et al.
Published: (2024)

MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
by: Pham, Trung X., et al.
Published: (2024)

AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
by: Baek, Hadam, et al.
Published: (2025)

End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation
by: Di Pierno, Andrea, et al.
Published: (2025)

Improving Bird Classification with Primary Color Additives
by: R, Ezhini Rasendiran, et al.
Published: (2025)

Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks
by: Erattakulangara, Subin, et al.
Published: (2025)

Acoustic Scene Classification: A Competition Review
by: Gharib, Shayan, et al.
Published: (2018)

SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
by: He, Yuhang, et al.
Published: (2024)

CineSRD: Leveraging Visual, Acoustic, and Linguistic Cues for Open-World Visual Media Speaker Diarization
by: Huang, Liangbin, et al.
Published: (2026)

Benchmarking Machine Learning Methods for Distributed Acoustic Sensing
by: Shi, Shuaikai, et al.
Published: (2025)

DiffSSD: A Diffusion-Based Dataset For Speech Forensics
by: Bhagtani, Kratika, et al.
Published: (2024)

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
by: Chi, Xiaowei, et al.
Published: (2024)

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
by: Xing, Yazhou, et al.
Published: (2024)

Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning
by: Sun, Luoyi, et al.
Published: (2023)

The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization
by: Yaman, Ilayda, et al.
Published: (2023)

SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition
by: Wang, Hao, et al.
Published: (2024)

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
by: Ma, Jian, et al.
Published: (2024)

LuViRA Dataset Validation and Discussion: Comparing Vision, Radio, and Audio Sensors for Indoor Localization
by: Yaman, Ilayda, et al.
Published: (2023)

MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation
by: Takahashi, Akira, et al.
Published: (2026)

AISHELL6-whisper: A Chinese Mandarin Audio-visual Whisper Speech Dataset with Speech Recognition Baselines
by: Li, Cancan, et al.
Published: (2025)

SEABAD: A Tropical Bird Activity Detection Dataset for Passive Acoustic Monitoring
by: Zabidi, Muhammad Mun'im Ahmad, et al.
Published: (2026)

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video
by: Cai, Kevin, et al.
Published: (2024)

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
by: Choi, Jeongsoo, et al.
Published: (2024)