:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liem, Cynthia C. S., Taşcılar, Doğa, Demetriou, Andrew M.
Format:	Preprint
Published:	2024
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2410.03676
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge
by: Shao, Keren, et al.
Published: (2024)

ICASSP 2026 URGENT Speech Enhancement Challenge
by: Li, Chenda, et al.
Published: (2026)

The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
by: Ma, Guobin, et al.
Published: (2026)

The ICASSP 2024 Audio Deep Packet Loss Concealment Challenge
by: Diener, Lorenz, et al.
Published: (2024)

ClaritySpeech: Dementia Obfuscation in Speech
by: Woszczyk, Dominika, et al.
Published: (2025)

Prosody-Driven Privacy-Preserving Dementia Detection
by: Woszczyk, Dominika, et al.
Published: (2024)

Unsupervised outlier detection to improve bird audio dataset labels
by: Collins, Bruce
Published: (2025)

SCORE-SET: A dataset of GuitarPro files for Music Phrase Generation and Sequence Learning
by: Begari, Vishakh
Published: (2025)

Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
by: Ghani, Burooj, et al.
Published: (2024)

SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling
by: Ahmed, Tawsif, et al.
Published: (2025)

Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)

KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge
by: Yu, Guochen, et al.
Published: (2024)

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation
by: Chang, Sungkyun, et al.
Published: (2024)

ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis
by: Ni-Hahn, Stephen, et al.
Published: (2025)

MidiCaps: A large-scale MIDI dataset with text captions
by: Melechovsky, Jan, et al.
Published: (2024)

Detecting gamma-band responses to the speech envelope for the ICASSP 2024 Auditory EEG Decoding Signal Processing Grand Challenge
by: Thornton, Mike, et al.
Published: (2024)

U-Mamba-Net: A highly efficient Mamba-based U-net style network for noisy and reverberant speech separation
by: Dang, Shaoxiang, et al.
Published: (2024)

Learning Disentangled Audio Representations through Controlled Synthesis
by: Brima, Yusuf, et al.
Published: (2024)

Identification of Cognitive Decline from Spoken Language through Feature Selection and the Bag of Acoustic Words Model
by: Niemelä, Marko, et al.
Published: (2024)

Mismatch-Robust Underwater Acoustic Localization Using A Differentiable Modular Forward Model
by: Kari, Dariush, et al.
Published: (2025)

Boosting keyword spotting through on-device learnable user speech characteristics
by: Cioflan, Cristian, et al.
Published: (2024)

Joint Source-Environment Adaptation for Deep Learning-Based Underwater Acoustic Source Ranging
by: Kari, Dariush, et al.
Published: (2025)

A Data-Centric Framework for Machine Listening Projects: Addressing Large-Scale Data Acquisition and Labeling through Active Learning
by: Naranjo-Alcazar, Javier, et al.
Published: (2024)

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
by: Siuzdak, Hubert
Published: (2023)

Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
by: Sinha, Anshuman, et al.
Published: (2024)

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
by: Tang, Beilong, et al.
Published: (2025)

Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds
by: Chang, Andrew, et al.
Published: (2025)

Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts
by: Xie, Yuan, et al.
Published: (2024)

RiTTA: Modeling Event Relations in Text-to-Audio Generation
by: He, Yuhang, et al.
Published: (2024)

Joint Source-Environment Adaptation of Data-Driven Underwater Acoustic Source Ranging Based on Model Uncertainty
by: Kari, Dariush, et al.
Published: (2025)

On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence
by: Ramasso, Emmanuel, et al.
Published: (2024)

ASTRA: Aligning Speech and Text Representations for Asr without Sampling
by: Gaur, Neeraj, et al.
Published: (2024)

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Wang, He, et al.
Published: (2024)

LLark: A Multimodal Instruction-Following Language Model for Music
by: Gardner, Josh, et al.
Published: (2023)

A Feature Engineering Approach for Literary and Colloquial Tamil Speech Classification using 1D-CNN
by: Nanmalar, M., et al.
Published: (2024)

FlowDec: A flow-based full-band general audio codec with high perceptual quality
by: Welker, Simon, et al.
Published: (2025)

Multimodal Lyrics-Rhythm Matching
by: Liao, Callie C., et al.
Published: (2023)

Reconstruction of Sound Field through Diffusion Models
by: Miotello, Federico, et al.
Published: (2023)

Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
by: Park, Hyun Jin, et al.
Published: (2024)

Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
by: Park, Hyun Jin, et al.
Published: (2024)