Saved in:
| Main Authors: | Zheng, Yujia, Tang, Zeyu, Qiu, Yiwen, Schölkopf, Bernhard, Zhang, Kun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.00529 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
by: Yan, Yujia, et al.
Published: (2024)
by: Yan, Yujia, et al.
Published: (2024)
Targeted Augmented Data for Audio Deepfake Detection
by: Astrid, Marcella, et al.
Published: (2024)
by: Astrid, Marcella, et al.
Published: (2024)
Identifying birdsong syllables without labelled data
by: Teng, Mélisande, et al.
Published: (2025)
by: Teng, Mélisande, et al.
Published: (2025)
MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection
by: Wang, Zehao, et al.
Published: (2024)
by: Wang, Zehao, et al.
Published: (2024)
Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
by: Huang, Yujia, et al.
Published: (2024)
by: Huang, Yujia, et al.
Published: (2024)
Generative Semantic Communication for Text-to-Speech Synthesis
by: Zheng, Jiahao, et al.
Published: (2024)
by: Zheng, Jiahao, et al.
Published: (2024)
Adversarial Domain Adaptation for Metal Cutting Sound Detection: Leveraging Abundant Lab Data for Scarce Industry Data
by: Mostafiz, Mir Imtiaz, et al.
Published: (2024)
by: Mostafiz, Mir Imtiaz, et al.
Published: (2024)
LMU-Based Sequential Learning and Posterior Ensemble Fusion for Cross-Domain Infant Cry Classification
by: Jazaeri, Niloofar, et al.
Published: (2026)
by: Jazaeri, Niloofar, et al.
Published: (2026)
emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2024)
by: Rajapakshe, Thejan, et al.
Published: (2024)
ADD 2022: the First Audio Deep Synthesis Detection Challenge
by: Yi, Jiangyan, et al.
Published: (2022)
by: Yi, Jiangyan, et al.
Published: (2022)
Mitigating Sex Bias in Audio Data-driven COPD and COVID-19 Breathing Pattern Detection Models
by: Pfeifer, Rachel, et al.
Published: (2024)
by: Pfeifer, Rachel, et al.
Published: (2024)
DENSE: Dynamic Embedding Causal Target Speech Extraction
by: Wang, Yiwen, et al.
Published: (2024)
by: Wang, Yiwen, et al.
Published: (2024)
Audio-based Anomaly Detection in Industrial Machines Using Deep One-Class Support Vector Data Description
by: Kilickaya, Sertac, et al.
Published: (2024)
by: Kilickaya, Sertac, et al.
Published: (2024)
The Acoustic Camouflage Phenomenon: Re-evaluating Speech Features for Financial Risk Prediction
by: Dungrani, Dhruvin, et al.
Published: (2026)
by: Dungrani, Dhruvin, et al.
Published: (2026)
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
by: Chen, Xing, et al.
Published: (2022)
by: Chen, Xing, et al.
Published: (2022)
Can Masked Autoencoders Also Listen to Birds?
by: Rauch, Lukas, et al.
Published: (2025)
by: Rauch, Lukas, et al.
Published: (2025)
Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker Verification
by: Yao, Wei, et al.
Published: (2020)
by: Yao, Wei, et al.
Published: (2020)
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM
by: Sun, Zhaokai, et al.
Published: (2025)
by: Sun, Zhaokai, et al.
Published: (2025)
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
by: Zhao, Guanlong, et al.
Published: (2023)
by: Zhao, Guanlong, et al.
Published: (2023)
Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
by: Tesch, Kristina, et al.
Published: (2023)
by: Tesch, Kristina, et al.
Published: (2023)
Test-Time Training for Depression Detection
by: Dumpala, Sri Harsha, et al.
Published: (2024)
by: Dumpala, Sri Harsha, et al.
Published: (2024)
Does Audio Deepfake Detection Generalize?
by: Müller, Nicolas M., et al.
Published: (2022)
by: Müller, Nicolas M., et al.
Published: (2022)
Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
by: Park, Hyun Jin, et al.
Published: (2024)
by: Park, Hyun Jin, et al.
Published: (2024)
HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones
by: Shashaank, N, et al.
Published: (2023)
by: Shashaank, N, et al.
Published: (2023)
A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era
by: Ren, Zhao, et al.
Published: (2023)
by: Ren, Zhao, et al.
Published: (2023)
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
by: Tang, Beilong, et al.
Published: (2024)
by: Tang, Beilong, et al.
Published: (2024)
JAZZVAR: A Dataset of Variations found within Solo Piano Performances of Jazz Standards for Music Overpainting
by: Row, Eleanor, et al.
Published: (2023)
by: Row, Eleanor, et al.
Published: (2023)
Generating Music with Structure Using Self-Similarity as Attention
by: Hager, Sophia, et al.
Published: (2024)
by: Hager, Sophia, et al.
Published: (2024)
Do Foundational Audio Encoders Understand Music Structure?
by: Toyama, Keisuke, et al.
Published: (2025)
by: Toyama, Keisuke, et al.
Published: (2025)
Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
by: Park, Hyun Jin, et al.
Published: (2024)
by: Park, Hyun Jin, et al.
Published: (2024)
Identification of Cognitive Decline from Spoken Language through Feature Selection and the Bag of Acoustic Words Model
by: Niemelä, Marko, et al.
Published: (2024)
by: Niemelä, Marko, et al.
Published: (2024)
Autoregressive Guidance of Deep Spatially Selective Filters using Bayesian Tracking for Efficient Extraction of Moving Speakers
by: Kienegger, Jakob, et al.
Published: (2026)
by: Kienegger, Jakob, et al.
Published: (2026)
Sound Event Detection and Localization with Distance Estimation
by: Krause, Daniel Aleksander, et al.
Published: (2024)
by: Krause, Daniel Aleksander, et al.
Published: (2024)
Improving Generalization for AI-Synthesized Voice Detection
by: Ren, Hainan, et al.
Published: (2024)
by: Ren, Hainan, et al.
Published: (2024)
Detecting music deepfakes is easy but actually hard
by: Afchar, Darius, et al.
Published: (2024)
by: Afchar, Darius, et al.
Published: (2024)
Toward Faithful Explanations in Acoustic Anomaly Detection
by: Elrashid, Maab, et al.
Published: (2026)
by: Elrashid, Maab, et al.
Published: (2026)
Unified AI for Accurate Audio Anomaly Detection
by: Khaleghpour, Hamideh, et al.
Published: (2025)
by: Khaleghpour, Hamideh, et al.
Published: (2025)
Steering Deep Non-Linear Spatially Selective Filters for Weakly Guided Extraction of Moving Speakers in Dynamic Scenarios
by: Kienegger, Jakob, et al.
Published: (2025)
by: Kienegger, Jakob, et al.
Published: (2025)
Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
by: Bralios, Dimitrios, et al.
Published: (2025)
by: Bralios, Dimitrios, et al.
Published: (2025)
Impact of Speech Mode in Automatic Pathological Speech Detection
by: Sheikh, Shakeel A., et al.
Published: (2024)
by: Sheikh, Shakeel A., et al.
Published: (2024)
Similar Items
-
Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
by: Yan, Yujia, et al.
Published: (2024) -
Targeted Augmented Data for Audio Deepfake Detection
by: Astrid, Marcella, et al.
Published: (2024) -
Identifying birdsong syllables without labelled data
by: Teng, Mélisande, et al.
Published: (2025) -
MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection
by: Wang, Zehao, et al.
Published: (2024) -
Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
by: Huang, Yujia, et al.
Published: (2024)