Saved in:
| Main Authors: | Han, Chaeyeon, Seshadri, Pavan, Ding, Yiwei, Posner, Noah, Koo, Bon Woo, Agrawal, Animesh, Lerch, Alexander, Guhathakurta, Subhrajit |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.09998 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ASPED: An Audio Dataset for Detecting Pedestrians
by: Seshadri, Pavan, et al.
Published: (2023)
by: Seshadri, Pavan, et al.
Published: (2023)
Audio-Based Pedestrian Detection in the Presence of Vehicular Noise
by: Kim, Yonghyun, et al.
Published: (2025)
by: Kim, Yonghyun, et al.
Published: (2025)
Building Audio-Visual Digital Twins with Smartphones
by: Lan, Zitong, et al.
Published: (2025)
by: Lan, Zitong, et al.
Published: (2025)
Can Large Language Models Predict Audio Effects Parameters from Natural Language?
by: Doh, Seungheon, et al.
Published: (2025)
by: Doh, Seungheon, et al.
Published: (2025)
CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
by: Oh, Hyunwoo, et al.
Published: (2025)
by: Oh, Hyunwoo, et al.
Published: (2025)
MuseAgent-1: Interactive Grounded Multimodal Understanding of Music Scores and Performance Audio
by: Zhao, Qihao, et al.
Published: (2026)
by: Zhao, Qihao, et al.
Published: (2026)
Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
by: Su, Yi, et al.
Published: (2025)
by: Su, Yi, et al.
Published: (2025)
EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations
by: Chang, Jung-Woo, et al.
Published: (2024)
by: Chang, Jung-Woo, et al.
Published: (2024)
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
by: Huang, Zhiqi, et al.
Published: (2024)
by: Huang, Zhiqi, et al.
Published: (2024)
LSTMSE-Net: Long Short Term Speech Enhancement Network for Audio-visual Speech Enhancement
by: Jain, Arnav, et al.
Published: (2024)
by: Jain, Arnav, et al.
Published: (2024)
MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction
by: Zhou, Wangjin, et al.
Published: (2024)
by: Zhou, Wangjin, et al.
Published: (2024)
Trusted Fake Audio Detection Based on Dirichlet Distribution
by: Ding, Chi, et al.
Published: (2025)
by: Ding, Chi, et al.
Published: (2025)
LoVA: Long-form Video-to-Audio Generation
by: Cheng, Xin, et al.
Published: (2024)
by: Cheng, Xin, et al.
Published: (2024)
SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation
by: Shimada, Kazuki, et al.
Published: (2024)
by: Shimada, Kazuki, et al.
Published: (2024)
Resource-Efficient Reference-Free Evaluation of Audio Captions
by: Mahfuz, Rehana, et al.
Published: (2024)
by: Mahfuz, Rehana, et al.
Published: (2024)
Efficient Video to Audio Mapper with Visual Scene Detection
by: Yi, Mingjing, et al.
Published: (2024)
by: Yi, Mingjing, et al.
Published: (2024)
Cinematic Audio Source Separation Using Visual Cues
by: Zhang, Kang, et al.
Published: (2026)
by: Zhang, Kang, et al.
Published: (2026)
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
by: Zhao, Jinzheng, et al.
Published: (2023)
by: Zhao, Jinzheng, et al.
Published: (2023)
Multimodal Emotion Recognition from Raw Audio with Sinc-convolution
by: Zhang, Xiaohui, et al.
Published: (2024)
by: Zhang, Xiaohui, et al.
Published: (2024)
Zero-Shot Fake Video Detection by Audio-Visual Consistency
by: Li, Xiaolou, et al.
Published: (2024)
by: Li, Xiaolou, et al.
Published: (2024)
Audio-Visual Speech Separation via Bottleneck Iterative Network
by: Zhang, Sidong, et al.
Published: (2025)
by: Zhang, Sidong, et al.
Published: (2025)
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training
by: Liu, Shengqiang, et al.
Published: (2024)
by: Liu, Shengqiang, et al.
Published: (2024)
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
by: Shi, Jiatong, et al.
Published: (2024)
by: Shi, Jiatong, et al.
Published: (2024)
Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities
by: Sudarsanam, Parthasaarathy, et al.
Published: (2025)
by: Sudarsanam, Parthasaarathy, et al.
Published: (2025)
Dialogue Understandability: Why are we streaming movies with subtitles?
by: Martinez, Helard Becerra, et al.
Published: (2024)
by: Martinez, Helard Becerra, et al.
Published: (2024)
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
by: Yu, Fan, et al.
Published: (2024)
by: Yu, Fan, et al.
Published: (2024)
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
by: Niu, Xinlei, et al.
Published: (2024)
by: Niu, Xinlei, et al.
Published: (2024)
MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer
by: Yao, Dong, et al.
Published: (2023)
by: Yao, Dong, et al.
Published: (2023)
Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection
by: Huang, Lian, et al.
Published: (2024)
by: Huang, Lian, et al.
Published: (2024)
POLIPHONE: A Dataset for Smartphone Model Identification from Audio Recordings
by: Salvi, Davide, et al.
Published: (2024)
by: Salvi, Davide, et al.
Published: (2024)
StereoFoley: Object-Aware Stereo Audio Generation from Video
by: Karchkhadze, Tornike, et al.
Published: (2025)
by: Karchkhadze, Tornike, et al.
Published: (2025)
Human-Inspired Computing for Robust and Efficient Audio-Visual Speech Recognition
by: Liu, Qianhui, et al.
Published: (2024)
by: Liu, Qianhui, et al.
Published: (2024)
Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
by: Niu, Xinlei, et al.
Published: (2025)
by: Niu, Xinlei, et al.
Published: (2025)
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
by: He, Mao-Kui, et al.
Published: (2024)
by: He, Mao-Kui, et al.
Published: (2024)
RVCBench: Benchmarking the Robustness of Voice Cloning Across Modern Audio Generation Models
by: Jin, Ruinan, et al.
Published: (2026)
by: Jin, Ruinan, et al.
Published: (2026)
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
by: Ren, Yong, et al.
Published: (2024)
by: Ren, Yong, et al.
Published: (2024)
V2A-DPO: Omni-Preference Optimization for Video-to-Audio Generation
by: Chan, Nolan, et al.
Published: (2026)
by: Chan, Nolan, et al.
Published: (2026)
FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation
by: Yan, Jialin, et al.
Published: (2025)
by: Yan, Jialin, et al.
Published: (2025)
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
by: Lei, Ke, et al.
Published: (2026)
by: Lei, Ke, et al.
Published: (2026)
Similar Items
-
ASPED: An Audio Dataset for Detecting Pedestrians
by: Seshadri, Pavan, et al.
Published: (2023) -
Audio-Based Pedestrian Detection in the Presence of Vehicular Noise
by: Kim, Yonghyun, et al.
Published: (2025) -
Building Audio-Visual Digital Twins with Smartphones
by: Lan, Zitong, et al.
Published: (2025) -
Can Large Language Models Predict Audio Effects Parameters from Natural Language?
by: Doh, Seungheon, et al.
Published: (2025) -
CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
by: Oh, Hyunwoo, et al.
Published: (2025)