:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Yujia, Tang, Zeyu, Qiu, Yiwen, Schölkopf, Bernhard, Zhang, Kun
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Sound Audio and Speech Processing Statistics Theory
Online Access:	https://arxiv.org/abs/2407.00529
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
by: Yan, Yujia, et al.
Published: (2024)

Targeted Augmented Data for Audio Deepfake Detection
by: Astrid, Marcella, et al.
Published: (2024)

Identifying birdsong syllables without labelled data
by: Teng, Mélisande, et al.
Published: (2025)

MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection
by: Wang, Zehao, et al.
Published: (2024)

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
by: Huang, Yujia, et al.
Published: (2024)

Generative Semantic Communication for Text-to-Speech Synthesis
by: Zheng, Jiahao, et al.
Published: (2024)

Adversarial Domain Adaptation for Metal Cutting Sound Detection: Leveraging Abundant Lab Data for Scarce Industry Data
by: Mostafiz, Mir Imtiaz, et al.
Published: (2024)

LMU-Based Sequential Learning and Posterior Ensemble Fusion for Cross-Domain Infant Cry Classification
by: Jazaeri, Niloofar, et al.
Published: (2026)

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2024)

ADD 2022: the First Audio Deep Synthesis Detection Challenge
by: Yi, Jiangyan, et al.
Published: (2022)

Mitigating Sex Bias in Audio Data-driven COPD and COVID-19 Breathing Pattern Detection Models
by: Pfeifer, Rachel, et al.
Published: (2024)

DENSE: Dynamic Embedding Causal Target Speech Extraction
by: Wang, Yiwen, et al.
Published: (2024)

Audio-based Anomaly Detection in Industrial Machines Using Deep One-Class Support Vector Data Description
by: Kilickaya, Sertac, et al.
Published: (2024)

The Acoustic Camouflage Phenomenon: Re-evaluating Speech Features for Financial Risk Prediction
by: Dungrani, Dhruvin, et al.
Published: (2026)

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
by: Chen, Xing, et al.
Published: (2022)

Can Masked Autoencoders Also Listen to Birds?
by: Rauch, Lukas, et al.
Published: (2025)

Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker Verification
by: Yao, Wei, et al.
Published: (2020)

Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM
by: Sun, Zhaokai, et al.
Published: (2025)

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
by: Zhao, Guanlong, et al.
Published: (2023)

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
by: Tesch, Kristina, et al.
Published: (2023)

Test-Time Training for Depression Detection
by: Dumpala, Sri Harsha, et al.
Published: (2024)

Does Audio Deepfake Detection Generalize?
by: Müller, Nicolas M., et al.
Published: (2022)

Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
by: Park, Hyun Jin, et al.
Published: (2024)

HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones
by: Shashaank, N, et al.
Published: (2023)

A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era
by: Ren, Zhao, et al.
Published: (2023)

TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
by: Tang, Beilong, et al.
Published: (2024)

JAZZVAR: A Dataset of Variations found within Solo Piano Performances of Jazz Standards for Music Overpainting
by: Row, Eleanor, et al.
Published: (2023)

Generating Music with Structure Using Self-Similarity as Attention
by: Hager, Sophia, et al.
Published: (2024)

Do Foundational Audio Encoders Understand Music Structure?
by: Toyama, Keisuke, et al.
Published: (2025)

Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
by: Park, Hyun Jin, et al.
Published: (2024)

Identification of Cognitive Decline from Spoken Language through Feature Selection and the Bag of Acoustic Words Model
by: Niemelä, Marko, et al.
Published: (2024)

Autoregressive Guidance of Deep Spatially Selective Filters using Bayesian Tracking for Efficient Extraction of Moving Speakers
by: Kienegger, Jakob, et al.
Published: (2026)

Sound Event Detection and Localization with Distance Estimation
by: Krause, Daniel Aleksander, et al.
Published: (2024)

Improving Generalization for AI-Synthesized Voice Detection
by: Ren, Hainan, et al.
Published: (2024)

Detecting music deepfakes is easy but actually hard
by: Afchar, Darius, et al.
Published: (2024)

Toward Faithful Explanations in Acoustic Anomaly Detection
by: Elrashid, Maab, et al.
Published: (2026)

Unified AI for Accurate Audio Anomaly Detection
by: Khaleghpour, Hamideh, et al.
Published: (2025)

Steering Deep Non-Linear Spatially Selective Filters for Weakly Guided Extraction of Moving Speakers in Dynamic Scenarios
by: Kienegger, Jakob, et al.
Published: (2025)

Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
by: Bralios, Dimitrios, et al.
Published: (2025)

Impact of Speech Mode in Automatic Pathological Speech Detection
by: Sheikh, Shakeel A., et al.
Published: (2024)