:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Han, Chaeyeon, Seshadri, Pavan, Ding, Yiwei, Posner, Noah, Koo, Bon Woo, Agrawal, Animesh, Lerch, Alexander, Guhathakurta, Subhrajit
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Artificial Intelligence Machine Learning Multimedia Sound
Online Access:	https://arxiv.org/abs/2406.09998
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ASPED: An Audio Dataset for Detecting Pedestrians
by: Seshadri, Pavan, et al.
Published: (2023)

Audio-Based Pedestrian Detection in the Presence of Vehicular Noise
by: Kim, Yonghyun, et al.
Published: (2025)

Building Audio-Visual Digital Twins with Smartphones
by: Lan, Zitong, et al.
Published: (2025)

Can Large Language Models Predict Audio Effects Parameters from Natural Language?
by: Doh, Seungheon, et al.
Published: (2025)

CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
by: Oh, Hyunwoo, et al.
Published: (2025)

MuseAgent-1: Interactive Grounded Multimodal Understanding of Music Scores and Performance Audio
by: Zhao, Qihao, et al.
Published: (2026)

Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
by: Su, Yi, et al.
Published: (2025)

EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations
by: Chang, Jung-Woo, et al.
Published: (2024)

Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
by: Huang, Zhiqi, et al.
Published: (2024)

LSTMSE-Net: Long Short Term Speech Enhancement Network for Audio-visual Speech Enhancement
by: Jain, Arnav, et al.
Published: (2024)

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction
by: Zhou, Wangjin, et al.
Published: (2024)

Trusted Fake Audio Detection Based on Dirichlet Distribution
by: Ding, Chi, et al.
Published: (2025)

LoVA: Long-form Video-to-Audio Generation
by: Cheng, Xin, et al.
Published: (2024)

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation
by: Shimada, Kazuki, et al.
Published: (2024)

Resource-Efficient Reference-Free Evaluation of Audio Captions
by: Mahfuz, Rehana, et al.
Published: (2024)

Efficient Video to Audio Mapper with Visual Scene Detection
by: Yi, Mingjing, et al.
Published: (2024)

Cinematic Audio Source Separation Using Visual Cues
by: Zhang, Kang, et al.
Published: (2026)

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
by: Zhao, Jinzheng, et al.
Published: (2023)

Multimodal Emotion Recognition from Raw Audio with Sinc-convolution
by: Zhang, Xiaohui, et al.
Published: (2024)

Zero-Shot Fake Video Detection by Audio-Visual Consistency
by: Li, Xiaolou, et al.
Published: (2024)

Audio-Visual Speech Separation via Bottleneck Iterative Network
by: Zhang, Sidong, et al.
Published: (2025)

Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)

DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training
by: Liu, Shengqiang, et al.
Published: (2024)

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
by: Shi, Jiatong, et al.
Published: (2024)

Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities
by: Sudarsanam, Parthasaarathy, et al.
Published: (2025)

Dialogue Understandability: Why are we streaming movies with subtitles?
by: Martinez, Helard Becerra, et al.
Published: (2024)

LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
by: Yu, Fan, et al.
Published: (2024)

HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
by: Niu, Xinlei, et al.
Published: (2024)

MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer
by: Yao, Dong, et al.
Published: (2023)

Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection
by: Huang, Lian, et al.
Published: (2024)

POLIPHONE: A Dataset for Smartphone Model Identification from Audio Recordings
by: Salvi, Davide, et al.
Published: (2024)

StereoFoley: Object-Aware Stereo Audio Generation from Video
by: Karchkhadze, Tornike, et al.
Published: (2025)

Human-Inspired Computing for Robust and Efficient Audio-Visual Speech Recognition
by: Liu, Qianhui, et al.
Published: (2024)

Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
by: Niu, Xinlei, et al.
Published: (2025)

Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
by: He, Mao-Kui, et al.
Published: (2024)

RVCBench: Benchmarking the Robustness of Voice Cloning Across Modern Audio Generation Models
by: Jin, Ruinan, et al.
Published: (2026)

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
by: Ren, Yong, et al.
Published: (2024)

V2A-DPO: Omni-Preference Optimization for Video-to-Audio Generation
by: Chan, Nolan, et al.
Published: (2026)

FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation
by: Yan, Jialin, et al.
Published: (2025)

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
by: Lei, Ke, et al.
Published: (2026)