Guardado en:
| Autores principales: | Jedrusiak, Mikel D., Harweg, Thomas, Haselhoff, Timo, Lawrence, Bryce T., Moebus, Susanne, Weichert, Frank |
|---|---|
| Formato: | Preprint |
| Publicado: |
2023
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2310.13404 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
por: Khanal, Subash, et al.
Publicado: (2024)
por: Khanal, Subash, et al.
Publicado: (2024)
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
por: Kang, Minjae, et al.
Publicado: (2025)
por: Kang, Minjae, et al.
Publicado: (2025)
Self-Supervised Audio-Visual Soundscape Stylization
por: Li, Tingle, et al.
Publicado: (2024)
por: Li, Tingle, et al.
Publicado: (2024)
Feature Selection via Graph Topology Inference for Soundscape Emotion Recognition
por: Rey, Samuel, et al.
Publicado: (2025)
por: Rey, Samuel, et al.
Publicado: (2025)
Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs
por: Ooi, Kenneth, et al.
Publicado: (2023)
por: Ooi, Kenneth, et al.
Publicado: (2023)
Robust Bioacoustic Detection via Richly Labelled Synthetic Soundscape Augmentation
por: Soltero, Kaspar, et al.
Publicado: (2025)
por: Soltero, Kaspar, et al.
Publicado: (2025)
Generating Diverse Audio-Visual 360 Soundscapes for Sound Event Localization and Detection
por: Roman, Adrian S., et al.
Publicado: (2025)
por: Roman, Adrian S., et al.
Publicado: (2025)
ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes
por: Ooi, Kenneth, et al.
Publicado: (2022)
por: Ooi, Kenneth, et al.
Publicado: (2022)
Automating Urban Soundscape Enhancements with AI: In-situ Assessment of Quality and Restorativeness in Traffic-Exposed Residential Areas
por: Lam, Bhan, et al.
Publicado: (2024)
por: Lam, Bhan, et al.
Publicado: (2024)
Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks
por: Liu, Chang, et al.
Publicado: (2025)
por: Liu, Chang, et al.
Publicado: (2025)
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
por: Cappellazzo, Umberto, et al.
Publicado: (2025)
por: Cappellazzo, Umberto, et al.
Publicado: (2025)
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
por: Sun, Peiwen, et al.
Publicado: (2024)
por: Sun, Peiwen, et al.
Publicado: (2024)
SSAVSV: Towards Unified Model for Self-Supervised Audio-Visual Speaker Verification
por: Rajasekhar, Gnana Praveen, et al.
Publicado: (2025)
por: Rajasekhar, Gnana Praveen, et al.
Publicado: (2025)
Fine-grained Soundscape Control for Augmented Hearing
por: Oh, Seunghyun, et al.
Publicado: (2026)
por: Oh, Seunghyun, et al.
Publicado: (2026)
Sound Tagging in Infant-centric Home Soundscapes
por: Khan, Mohammad Nur Hossain, et al.
Publicado: (2024)
por: Khan, Mohammad Nur Hossain, et al.
Publicado: (2024)
Soundscape Captioning using Sound Affective Quality Network and Large Language Model
por: Hou, Yuanbo, et al.
Publicado: (2024)
por: Hou, Yuanbo, et al.
Publicado: (2024)
Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework
por: Di Pierno, Andrea, et al.
Publicado: (2025)
por: Di Pierno, Andrea, et al.
Publicado: (2025)
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
por: Fu, Chaoyou, et al.
Publicado: (2025)
por: Fu, Chaoyou, et al.
Publicado: (2025)
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
por: Zhang, Haomin, et al.
Publicado: (2025)
por: Zhang, Haomin, et al.
Publicado: (2025)
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
por: Lai, Yung-Hsuan, et al.
Publicado: (2025)
por: Lai, Yung-Hsuan, et al.
Publicado: (2025)
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
por: Rouditchenko, Andrew, et al.
Publicado: (2025)
por: Rouditchenko, Andrew, et al.
Publicado: (2025)
Generating Moving 3D Soundscapes with Latent Diffusion Models
por: Templin, Christian, et al.
Publicado: (2025)
por: Templin, Christian, et al.
Publicado: (2025)
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
por: Hegde, Sindhu, et al.
Publicado: (2024)
por: Hegde, Sindhu, et al.
Publicado: (2024)
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
por: Rouditchenko, Andrew, et al.
Publicado: (2024)
por: Rouditchenko, Andrew, et al.
Publicado: (2024)
Misophonia Trigger Sound Detection on Synthetic Soundscapes Using a Hybrid Model with a Frozen Pre-Trained CNN and a Time-Series Module
por: Sashida, Kurumi, et al.
Publicado: (2026)
por: Sashida, Kurumi, et al.
Publicado: (2026)
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
por: Pham, Trung X., et al.
Publicado: (2024)
por: Pham, Trung X., et al.
Publicado: (2024)
Improving Acoustic Scene Classification with City Features
por: Cai, Yiqiang, et al.
Publicado: (2025)
por: Cai, Yiqiang, et al.
Publicado: (2025)
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
por: Atito, Sara, et al.
Publicado: (2022)
por: Atito, Sara, et al.
Publicado: (2022)
Gotta Hear Them All: Towards Sound Source Aware Audio Generation
por: Guo, Wei, et al.
Publicado: (2024)
por: Guo, Wei, et al.
Publicado: (2024)
Emotional Vietnamese Speech-Based Depression Diagnosis Using Dynamic Attention Mechanism
por: D., Quang-Anh N., et al.
Publicado: (2024)
por: D., Quang-Anh N., et al.
Publicado: (2024)
Do We Need EMA for Diffusion-Based Speech Enhancement? Toward a Magnitude-Preserving Network Architecture
por: Richter, Julius, et al.
Publicado: (2025)
por: Richter, Julius, et al.
Publicado: (2025)
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
por: Jiang, Xilin, et al.
Publicado: (2024)
por: Jiang, Xilin, et al.
Publicado: (2024)
SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving
por: Barik, Ayush, et al.
Publicado: (2026)
por: Barik, Ayush, et al.
Publicado: (2026)
Few-shot Acoustic Synthesis with Multimodal Flow Matching
por: Brunetto, Amandine
Publicado: (2026)
por: Brunetto, Amandine
Publicado: (2026)
pycnet-audio: A Python package to support bioacoustics data processing
por: Ruff, Zachary J., et al.
Publicado: (2025)
por: Ruff, Zachary J., et al.
Publicado: (2025)
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
por: Choi, Jeongsoo, et al.
Publicado: (2024)
por: Choi, Jeongsoo, et al.
Publicado: (2024)
Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes
por: Ryu, Hyeonggon, et al.
Publicado: (2025)
por: Ryu, Hyeonggon, et al.
Publicado: (2025)
Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model
por: Sun, Changchang, et al.
Publicado: (2025)
por: Sun, Changchang, et al.
Publicado: (2025)
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
por: Park, Young-Hu, et al.
Publicado: (2025)
por: Park, Young-Hu, et al.
Publicado: (2025)
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation
por: Shan, Sizhe, et al.
Publicado: (2025)
por: Shan, Sizhe, et al.
Publicado: (2025)
Ejemplares similares
-
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
por: Khanal, Subash, et al.
Publicado: (2024) -
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
por: Kang, Minjae, et al.
Publicado: (2025) -
Self-Supervised Audio-Visual Soundscape Stylization
por: Li, Tingle, et al.
Publicado: (2024) -
Feature Selection via Graph Topology Inference for Soundscape Emotion Recognition
por: Rey, Samuel, et al.
Publicado: (2025) -
Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs
por: Ooi, Kenneth, et al.
Publicado: (2023)