:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sims, Ysobel, Mendes, Alexandre, Chalup, Stephan
Format:	Preprint
Published:	2024
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2412.03771
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Self-Supervised Learning for Few-Shot Bird Sound Classification
by: Moummad, Ilyass, et al.
Published: (2023)

Multi-label Zero-Shot Audio Classification with Temporal Attention
by: Dogan, Duygu, et al.
Published: (2024)

Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
by: Moliner, Eloi, et al.
Published: (2023)

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)

Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)

SoundMorpher: Perceptually-Uniform Sound Morphing with Diffusion Model
by: Niu, Xinlei, et al.
Published: (2024)

Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction
by: Liu, Renhang, et al.
Published: (2024)

Focal Modulation Networks for Interpretable Sound Classification
by: Della Libera, Luca, et al.
Published: (2024)

Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials
by: Akram, Ali, et al.
Published: (2024)

Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)

Multi-modal Adversarial Training for Zero-Shot Voice Cloning
by: Janiczek, John, et al.
Published: (2024)

Advanced Framework for Animal Sound Classification With Features Optimization
by: Yang, Qiang, et al.
Published: (2024)

On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning
by: Tavares, Tiago, et al.
Published: (2024)

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
by: Moummad, Ilyass, et al.
Published: (2024)

Feature Aggregation in Joint Sound Classification and Localization Neural Networks
by: Healy, Brendan, et al.
Published: (2023)

Analysis-Driven Procedural Generation of an Engine Sound Dataset with Embedded Control Annotations
by: Doerfler, Robin, et al.
Published: (2026)

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
by: Manor, Hila, et al.
Published: (2024)

Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds
by: Chang, Andrew, et al.
Published: (2025)

Lungmix: A Mixup-Based Strategy for Generalization in Respiratory Sound Classification
by: Ge, Shijia, et al.
Published: (2024)

Zero-shot Voice Conversion with Diffusion Transformers
by: Liu, Songting
Published: (2024)

Listenable Maps for Zero-Shot Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
by: Heggan, Calum, et al.
Published: (2024)

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
by: Bae, Sangmin, et al.
Published: (2023)

Exploring Meta Information for Audio-based Zero-shot Bird Classification
by: Gebhard, Alexander, et al.
Published: (2023)

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)

Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
by: Nishida, Tomoya, et al.
Published: (2024)

Classification of Short Segment Pediatric Heart Sounds Based on a Transformer-Based Convolutional Neural Network
by: Hassanuzzaman, Md, et al.
Published: (2024)

Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses
by: Jeong, Seung Gyu, et al.
Published: (2025)

Reconstruction of Sound Field through Diffusion Models
by: Miotello, Federico, et al.
Published: (2023)

Investigating the Design Space of Diffusion Models for Speech Enhancement
by: Gonzalez, Philippe, et al.
Published: (2023)

Microphone Conversion: Mitigating Device Variability in Sound Event Classification
by: Ryu, Myeonghoon, et al.
Published: (2024)

SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction
by: Chen, Tuochao, et al.
Published: (2025)

SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
by: Dagli, Rishit, et al.
Published: (2024)

Voice Impression Control in Zero-Shot TTS
by: Fujita, Kenichi, et al.
Published: (2025)

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification
by: Bitterman, Jacob, et al.
Published: (2024)

The iNaturalist Sounds Dataset
by: Chasmai, Mustafa, et al.
Published: (2025)

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
by: Saito, Koichi, et al.
Published: (2024)

Sound Event Detection and Localization with Distance Estimation
by: Krause, Daniel Aleksander, et al.
Published: (2024)

Sound Tagging in Infant-centric Home Soundscapes
by: Khan, Mohammad Nur Hossain, et al.
Published: (2024)

Audio Geolocation: A Natural Sounds Benchmark
by: Chasmai, Mustafa, et al.
Published: (2025)