:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Postma, Emmy, Tejedor-Garcia, Cristian
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.02078
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers
by: Zhong, Terry Yi, et al.
Published: (2025)

Innovative Speech-Based Deep Learning Approaches for Parkinson's Disease Classification: A Systematic Review
by: van Gelderen, Lisanne, et al.
Published: (2024)

RECA-PD: A Robust Explainable Cross-Attention Method for Speech-based Parkinson's Disease Classification
by: Zhong, Terry Yi, et al.
Published: (2025)

A Benchmark for Early-stage Parkinson's Disease Detection from Speech
by: Zhong, Terry Yi, et al.
Published: (2026)

Zero-Shot Speech LLMs for Multi-Aspect Evaluation of L2 Speech: Challenges and Opportunities
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)

Zero-Shot Parkinson's Disease Detection from Speech: Comparing Large Audio and Language Models
by: Kabir, Muhammad Ashad, et al.
Published: (2026)

Evaluating Logit-Based GOP Scores for Mispronunciation Detection
by: Parikh, Aditya Kamlesh, et al.
Published: (2025)

Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
by: Dawn, Aditya, et al.
Published: (2024)

Rubric-Guided Fine-tuning of SpeechLLMs for Multi-Aspect, Multi-Rater L2 Reading-Speech Assessment
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)

Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts
by: Gao, Lingyun, et al.
Published: (2025)

Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge
by: Parikh, Aditya Kamlesh, et al.
Published: (2025)

Bilingual Dual-Head Deep Model for Parkinson's Disease Detection from Speech
by: La Quatra, Moreno, et al.
Published: (2025)

Reading Miscue Detection in Primary School through Automatic Speech Recognition
by: Gao, Lingyun, et al.
Published: (2024)

4,500 Seconds: Small Data Training Approaches for Deep UAV Audio Classification
by: Berg, Andrew P., et al.
Published: (2025)

Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods
by: Zhou, Xuanru, et al.
Published: (2026)

Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models
by: Jung, Kyudan, et al.
Published: (2026)

Unify Variables in Neural Scaling Laws for General Audio Representations via Embedding Effective Rank
by: Deng, Xuyao, et al.
Published: (2025)

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio
by: Alonso-Jiménez, Pablo, et al.
Published: (2024)

DASB - Discrete Audio and Speech Benchmark
by: Mousavi, Pooneh, et al.
Published: (2024)

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)

MATE: Matryoshka Audio-Text Embeddings for Open-Vocabulary Keyword Spotting
by: Jung, Youngmoon, et al.
Published: (2026)

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
by: Zheng, Xinhu, et al.
Published: (2024)

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)

Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
by: Nayeem, Md., et al.
Published: (2025)

JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions
by: Zhang, Leying, et al.
Published: (2026)

SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation
by: Wang, Helin, et al.
Published: (2026)

NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
by: Han, Minglun, et al.
Published: (2024)

Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)

Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music Model
by: Barnett, Julia, et al.
Published: (2024)

Scenario of Use Scheme: Threat Model Specification for Speaker Privacy Protection in the Medical Domain
by: Rahman, Mehtab Ur, et al.
Published: (2024)

Fundamental Survey on Neuromorphic Based Audio Classification
by: Basu, Amlan, et al.
Published: (2025)

Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio
by: Barański, Mateusz, et al.
Published: (2025)

LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement
by: Chen, Chih-Ning, et al.
Published: (2026)

Audio Deepfake Detection in the Age of Advanced Text-to-Speech models
by: Singh, Robin, et al.
Published: (2026)

Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2024)

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
by: Xie, Yuankun, et al.
Published: (2024)

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)

SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training
by: Mei, Xinhao, et al.
Published: (2026)

Embedding Alignment in Code Generation for Audio
by: Kouteili, Sam, et al.
Published: (2025)