:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Marszałek, Patryk, Rut, Maciej, Kawa, Piotr, Spurek, Przemysław, Syga, Piotr
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Sound Computer Vision and Pattern Recognition Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2503.02585
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Are audio DeepFake detection models polyglots?
von: Marek, Bartłomiej, et al.
Veröffentlicht: (2024)

HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation
von: Shan, Sizhe, et al.
Veröffentlicht: (2025)

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
von: Müller, Nicolas M., et al.
Veröffentlicht: (2024)

CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
von: Bai, Detao, et al.
Veröffentlicht: (2025)

Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
von: Chen, Tianxiang, et al.
Veröffentlicht: (2024)

Learning Self-Supervised Audio-Visual Representations for Sound Recommendations
von: Krishnamurthy, Sudha
Veröffentlicht: (2024)

DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
von: Nakada, Shota, et al.
Veröffentlicht: (2024)

CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation
von: Chen, Yuanhong, et al.
Veröffentlicht: (2025)

Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
von: Liu, Chen, et al.
Veröffentlicht: (2025)

OmniAudio: Generating Spatial Audio from 360-Degree Video
von: Liu, Huadai, et al.
Veröffentlicht: (2025)

DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation
von: Tian, Jingqi, et al.
Veröffentlicht: (2025)

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
von: Chen, Gongyu, et al.
Veröffentlicht: (2024)

Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning
von: Sun, Luoyi, et al.
Veröffentlicht: (2023)

Learning to Highlight Audio by Watching Movies
von: Huang, Chao, et al.
Veröffentlicht: (2025)

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
von: Tseng, Yuan, et al.
Veröffentlicht: (2023)

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
von: Zhang, Haomin, et al.
Veröffentlicht: (2025)

ZeroSep: Separate Anything in Audio with Zero Training
von: Huang, Chao, et al.
Veröffentlicht: (2025)

Siamese Vision Transformers are Scalable Audio-visual Learners
von: Lin, Yan-Bo, et al.
Veröffentlicht: (2024)

Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes
von: Ryu, Hyeonggon, et al.
Veröffentlicht: (2025)

JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching
von: Kwon, Mingi, et al.
Veröffentlicht: (2025)

UniSync: A Unified Framework for Audio-Visual Synchronization
von: Feng, Tao, et al.
Veröffentlicht: (2025)

Dual Audio-Centric Modality Coupling for Talking Head Generation
von: Fu, Ao, et al.
Veröffentlicht: (2025)

Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling
von: Korbar, Bruno, et al.
Veröffentlicht: (2024)

SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio
von: Tegler, Erik, et al.
Veröffentlicht: (2024)

Deep Active Audio Feature Learning in Resource-Constrained Environments
von: Mohaimenuzzaman, Md, et al.
Veröffentlicht: (2023)

Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
von: Berghi, Davide, et al.
Veröffentlicht: (2024)

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
von: Yeo, Jeong Hun, et al.
Veröffentlicht: (2025)

UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
von: Lai, Yung-Hsuan, et al.
Veröffentlicht: (2025)

MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
von: Cappellazzo, Umberto, et al.
Veröffentlicht: (2025)

Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
von: Wang, Juncheng, et al.
Veröffentlicht: (2025)

SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving
von: Barik, Ayush, et al.
Veröffentlicht: (2026)

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
von: Majumder, Sagnik, et al.
Veröffentlicht: (2023)

Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition
von: Li, Zeyu, et al.
Veröffentlicht: (2024)

Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine Grained-Localization
von: Klein, Nicholas, et al.
Veröffentlicht: (2025)

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
von: Rouditchenko, Andrew, et al.
Veröffentlicht: (2025)

SSAVSV: Towards Unified Model for Self-Supervised Audio-Visual Speaker Verification
von: Rajasekhar, Gnana Praveen, et al.
Veröffentlicht: (2025)

Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models
von: Lee, Seung-jae, et al.
Veröffentlicht: (2025)

Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs
von: Anand, et al.
Veröffentlicht: (2025)

Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention
von: Praveen, R. Gnana, et al.
Veröffentlicht: (2024)

ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
von: Atito, Sara, et al.
Veröffentlicht: (2022)