Saved in:
| Main Authors: | Airale, Louis, Pajot, Adrien, Linossier, Juliette |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.03633 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
by: Airale, Louis, et al.
Published: (2023)
by: Airale, Louis, et al.
Published: (2023)
Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data
by: Doell, Michael, et al.
Published: (2024)
by: Doell, Michael, et al.
Published: (2024)
SoundCam: A Dataset for Finding Humans Using Room Acoustics
by: Wang, Mason, et al.
Published: (2023)
by: Wang, Mason, et al.
Published: (2023)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
by: Chen, Ziyang, et al.
Published: (2024)
by: Chen, Ziyang, et al.
Published: (2024)
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
by: Zhang, Zhedong, et al.
Published: (2025)
by: Zhang, Zhedong, et al.
Published: (2025)
Improving Acoustic Scene Classification with City Features
by: Cai, Yiqiang, et al.
Published: (2025)
by: Cai, Yiqiang, et al.
Published: (2025)
SOAF: Scene Occlusion-aware Neural Acoustic Field
by: Gao, Huiyu, et al.
Published: (2024)
by: Gao, Huiyu, et al.
Published: (2024)
Few-shot Acoustic Synthesis with Multimodal Flow Matching
by: Brunetto, Amandine
Published: (2026)
by: Brunetto, Amandine
Published: (2026)
Modeling and Driving Human Body Soundfields through Acoustic Primitives
by: Huang, Chao, et al.
Published: (2024)
by: Huang, Chao, et al.
Published: (2024)
Novel-View Acoustic Synthesis from 3D Reconstructed Rooms
by: Ahn, Byeongjoo, et al.
Published: (2023)
by: Ahn, Byeongjoo, et al.
Published: (2023)
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
by: Brunetto, Amandine, et al.
Published: (2024)
by: Brunetto, Amandine, et al.
Published: (2024)
Decoding Emotions: Unveiling Facial Expressions through Acoustic Sensing with Contrastive Attention
by: Wang, Guangjing, et al.
Published: (2024)
by: Wang, Guangjing, et al.
Published: (2024)
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
by: Saad, Mahnoor Fatima, et al.
Published: (2025)
by: Saad, Mahnoor Fatima, et al.
Published: (2025)
RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling
by: Pham, Long-Khanh, et al.
Published: (2025)
by: Pham, Long-Khanh, et al.
Published: (2025)
Sonicmesh: Enhancing 3D Human Mesh Reconstruction in Vision-Impaired Environments With Acoustic Signals
by: Liang, Xiaoxuan, et al.
Published: (2024)
by: Liang, Xiaoxuan, et al.
Published: (2024)
Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition
by: Li, Zeyu, et al.
Published: (2024)
by: Li, Zeyu, et al.
Published: (2024)
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
by: Sun, Peiwen, et al.
Published: (2024)
by: Sun, Peiwen, et al.
Published: (2024)
ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling
by: Somayazulu, Arjun, et al.
Published: (2024)
by: Somayazulu, Arjun, et al.
Published: (2024)
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
by: Pham, Trung X., et al.
Published: (2024)
by: Pham, Trung X., et al.
Published: (2024)
AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
by: Baek, Hadam, et al.
Published: (2025)
by: Baek, Hadam, et al.
Published: (2025)
End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation
by: Di Pierno, Andrea, et al.
Published: (2025)
by: Di Pierno, Andrea, et al.
Published: (2025)
Improving Bird Classification with Primary Color Additives
by: R, Ezhini Rasendiran, et al.
Published: (2025)
by: R, Ezhini Rasendiran, et al.
Published: (2025)
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks
by: Erattakulangara, Subin, et al.
Published: (2025)
by: Erattakulangara, Subin, et al.
Published: (2025)
Acoustic Scene Classification: A Competition Review
by: Gharib, Shayan, et al.
Published: (2018)
by: Gharib, Shayan, et al.
Published: (2018)
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
by: He, Yuhang, et al.
Published: (2024)
by: He, Yuhang, et al.
Published: (2024)
CineSRD: Leveraging Visual, Acoustic, and Linguistic Cues for Open-World Visual Media Speaker Diarization
by: Huang, Liangbin, et al.
Published: (2026)
by: Huang, Liangbin, et al.
Published: (2026)
Benchmarking Machine Learning Methods for Distributed Acoustic Sensing
by: Shi, Shuaikai, et al.
Published: (2025)
by: Shi, Shuaikai, et al.
Published: (2025)
DiffSSD: A Diffusion-Based Dataset For Speech Forensics
by: Bhagtani, Kratika, et al.
Published: (2024)
by: Bhagtani, Kratika, et al.
Published: (2024)
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
by: Chi, Xiaowei, et al.
Published: (2024)
by: Chi, Xiaowei, et al.
Published: (2024)
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
by: Xing, Yazhou, et al.
Published: (2024)
by: Xing, Yazhou, et al.
Published: (2024)
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning
by: Sun, Luoyi, et al.
Published: (2023)
by: Sun, Luoyi, et al.
Published: (2023)
The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization
by: Yaman, Ilayda, et al.
Published: (2023)
by: Yaman, Ilayda, et al.
Published: (2023)
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
by: Ma, Jian, et al.
Published: (2024)
by: Ma, Jian, et al.
Published: (2024)
LuViRA Dataset Validation and Discussion: Comparing Vision, Radio, and Audio Sensors for Indoor Localization
by: Yaman, Ilayda, et al.
Published: (2023)
by: Yaman, Ilayda, et al.
Published: (2023)
MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation
by: Takahashi, Akira, et al.
Published: (2026)
by: Takahashi, Akira, et al.
Published: (2026)
AISHELL6-whisper: A Chinese Mandarin Audio-visual Whisper Speech Dataset with Speech Recognition Baselines
by: Li, Cancan, et al.
Published: (2025)
by: Li, Cancan, et al.
Published: (2025)
SEABAD: A Tropical Bird Activity Detection Dataset for Passive Acoustic Monitoring
by: Zabidi, Muhammad Mun'im Ahmad, et al.
Published: (2026)
by: Zabidi, Muhammad Mun'im Ahmad, et al.
Published: (2026)
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video
by: Cai, Kevin, et al.
Published: (2024)
by: Cai, Kevin, et al.
Published: (2024)
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
by: Choi, Jeongsoo, et al.
Published: (2024)
by: Choi, Jeongsoo, et al.
Published: (2024)
Similar Items
-
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
by: Airale, Louis, et al.
Published: (2023) -
Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data
by: Doell, Michael, et al.
Published: (2024) -
SoundCam: A Dataset for Finding Humans Using Room Acoustics
by: Wang, Mason, et al.
Published: (2023) -
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
by: Chen, Ziyang, et al.
Published: (2024) -
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
by: Zhang, Zhedong, et al.
Published: (2025)