Saved in:
| Main Authors: | Sims, Ysobel, Mendes, Alexandre, Chalup, Stephan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.03771 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Self-Supervised Learning for Few-Shot Bird Sound Classification
by: Moummad, Ilyass, et al.
Published: (2023)
by: Moummad, Ilyass, et al.
Published: (2023)
Multi-label Zero-Shot Audio Classification with Temporal Attention
by: Dogan, Duygu, et al.
Published: (2024)
by: Dogan, Duygu, et al.
Published: (2024)
Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
by: Moliner, Eloi, et al.
Published: (2023)
by: Moliner, Eloi, et al.
Published: (2023)
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)
by: Jiang, Ziyue, et al.
Published: (2025)
Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)
by: Levkovitch, Alon, et al.
Published: (2024)
SoundMorpher: Perceptually-Uniform Sound Morphing with Diffusion Model
by: Niu, Xinlei, et al.
Published: (2024)
by: Niu, Xinlei, et al.
Published: (2024)
Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction
by: Liu, Renhang, et al.
Published: (2024)
by: Liu, Renhang, et al.
Published: (2024)
Focal Modulation Networks for Interpretable Sound Classification
by: Della Libera, Luca, et al.
Published: (2024)
by: Della Libera, Luca, et al.
Published: (2024)
Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials
by: Akram, Ali, et al.
Published: (2024)
by: Akram, Ali, et al.
Published: (2024)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
Multi-modal Adversarial Training for Zero-Shot Voice Cloning
by: Janiczek, John, et al.
Published: (2024)
by: Janiczek, John, et al.
Published: (2024)
Advanced Framework for Animal Sound Classification With Features Optimization
by: Yang, Qiang, et al.
Published: (2024)
by: Yang, Qiang, et al.
Published: (2024)
On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning
by: Tavares, Tiago, et al.
Published: (2024)
by: Tavares, Tiago, et al.
Published: (2024)
Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
by: Moummad, Ilyass, et al.
Published: (2024)
by: Moummad, Ilyass, et al.
Published: (2024)
Feature Aggregation in Joint Sound Classification and Localization Neural Networks
by: Healy, Brendan, et al.
Published: (2023)
by: Healy, Brendan, et al.
Published: (2023)
Analysis-Driven Procedural Generation of an Engine Sound Dataset with Embedded Control Annotations
by: Doerfler, Robin, et al.
Published: (2026)
by: Doerfler, Robin, et al.
Published: (2026)
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
by: Manor, Hila, et al.
Published: (2024)
by: Manor, Hila, et al.
Published: (2024)
Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds
by: Chang, Andrew, et al.
Published: (2025)
by: Chang, Andrew, et al.
Published: (2025)
Lungmix: A Mixup-Based Strategy for Generalization in Respiratory Sound Classification
by: Ge, Shijia, et al.
Published: (2024)
by: Ge, Shijia, et al.
Published: (2024)
Zero-shot Voice Conversion with Diffusion Transformers
by: Liu, Songting
Published: (2024)
by: Liu, Songting
Published: (2024)
Listenable Maps for Zero-Shot Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)
by: Paissan, Francesco, et al.
Published: (2024)
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
by: Heggan, Calum, et al.
Published: (2024)
by: Heggan, Calum, et al.
Published: (2024)
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
by: Bae, Sangmin, et al.
Published: (2023)
by: Bae, Sangmin, et al.
Published: (2023)
Exploring Meta Information for Audio-based Zero-shot Bird Classification
by: Gebhard, Alexander, et al.
Published: (2023)
by: Gebhard, Alexander, et al.
Published: (2023)
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)
by: Arora, Akshit, et al.
Published: (2024)
Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
by: Nishida, Tomoya, et al.
Published: (2024)
by: Nishida, Tomoya, et al.
Published: (2024)
Classification of Short Segment Pediatric Heart Sounds Based on a Transformer-Based Convolutional Neural Network
by: Hassanuzzaman, Md, et al.
Published: (2024)
by: Hassanuzzaman, Md, et al.
Published: (2024)
Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses
by: Jeong, Seung Gyu, et al.
Published: (2025)
by: Jeong, Seung Gyu, et al.
Published: (2025)
Reconstruction of Sound Field through Diffusion Models
by: Miotello, Federico, et al.
Published: (2023)
by: Miotello, Federico, et al.
Published: (2023)
Investigating the Design Space of Diffusion Models for Speech Enhancement
by: Gonzalez, Philippe, et al.
Published: (2023)
by: Gonzalez, Philippe, et al.
Published: (2023)
Microphone Conversion: Mitigating Device Variability in Sound Event Classification
by: Ryu, Myeonghoon, et al.
Published: (2024)
by: Ryu, Myeonghoon, et al.
Published: (2024)
SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction
by: Chen, Tuochao, et al.
Published: (2025)
by: Chen, Tuochao, et al.
Published: (2025)
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
by: Dagli, Rishit, et al.
Published: (2024)
by: Dagli, Rishit, et al.
Published: (2024)
Voice Impression Control in Zero-Shot TTS
by: Fujita, Kenichi, et al.
Published: (2025)
by: Fujita, Kenichi, et al.
Published: (2025)
RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification
by: Bitterman, Jacob, et al.
Published: (2024)
by: Bitterman, Jacob, et al.
Published: (2024)
The iNaturalist Sounds Dataset
by: Chasmai, Mustafa, et al.
Published: (2025)
by: Chasmai, Mustafa, et al.
Published: (2025)
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
by: Saito, Koichi, et al.
Published: (2024)
by: Saito, Koichi, et al.
Published: (2024)
Sound Event Detection and Localization with Distance Estimation
by: Krause, Daniel Aleksander, et al.
Published: (2024)
by: Krause, Daniel Aleksander, et al.
Published: (2024)
Sound Tagging in Infant-centric Home Soundscapes
by: Khan, Mohammad Nur Hossain, et al.
Published: (2024)
by: Khan, Mohammad Nur Hossain, et al.
Published: (2024)
Audio Geolocation: A Natural Sounds Benchmark
by: Chasmai, Mustafa, et al.
Published: (2025)
by: Chasmai, Mustafa, et al.
Published: (2025)
Similar Items
-
Self-Supervised Learning for Few-Shot Bird Sound Classification
by: Moummad, Ilyass, et al.
Published: (2023) -
Multi-label Zero-Shot Audio Classification with Temporal Attention
by: Dogan, Duygu, et al.
Published: (2024) -
Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
by: Moliner, Eloi, et al.
Published: (2023) -
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025) -
Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)