:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	van Gelderen, Lisanne, Tejedor-García, Cristian
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Computation and Language Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.17844
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RECA-PD: A Robust Explainable Cross-Attention Method for Speech-based Parkinson's Disease Classification
by: Zhong, Terry Yi, et al.
Published: (2025)

Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers
by: Zhong, Terry Yi, et al.
Published: (2025)

A Benchmark for Early-stage Parkinson's Disease Detection from Speech
by: Zhong, Terry Yi, et al.
Published: (2026)

Zero-Shot Speech LLMs for Multi-Aspect Evaluation of L2 Speech: Challenges and Opportunities
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)

Rubric-Guided Fine-tuning of SpeechLLMs for Multi-Aspect, Multi-Rater L2 Reading-Speech Assessment
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)

Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts
by: Gao, Lingyun, et al.
Published: (2025)

Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
by: Postma, Emmy, et al.
Published: (2025)

Evaluating Logit-Based GOP Scores for Mispronunciation Detection
by: Parikh, Aditya Kamlesh, et al.
Published: (2025)

Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
by: Elmakies, Avishai, et al.
Published: (2025)

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
by: Wills, Simone, et al.
Published: (2023)

Detection and Forecasting of Parkinson Disease Progression from Speech Signal Features Using MultiLayer Perceptron and LSTM
by: Ali, Majid, et al.
Published: (2024)

SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
by: Yao, Wenhan, et al.
Published: (2025)

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)

SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech
by: R, Aron, et al.
Published: (2024)

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)

Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
by: Vu, Tai
Published: (2025)

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
by: Zhang, Shaolei, et al.
Published: (2024)

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders
by: Bai, Yu, et al.
Published: (2023)

Zero-Shot Parkinson's Disease Detection from Speech: Comparing Large Audio and Language Models
by: Kabir, Muhammad Ashad, et al.
Published: (2026)

Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
by: Pokel, Niclas, et al.
Published: (2025)

Towards the Next Frontier in Speech Representation Learning Using Disentanglement
by: Krishna, Varun, et al.
Published: (2024)

FlashSpeech: Efficient Zero-Shot Speech Synthesis
by: Ye, Zhen, et al.
Published: (2024)

STTATTS: Unified Speech-To-Text And Text-To-Speech Model
by: Toyin, Hawau Olamide, et al.
Published: (2024)

A Chinese Heart Failure Status Speech Database with Universal and Personalised Classification
by: Pan, Yue, et al.
Published: (2025)

Advancing Speech Understanding in Speech-Aware Language Models with GRPO
by: Elmakies, Avishai, et al.
Published: (2025)

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
by: Peng, Puyuan, et al.
Published: (2024)

AFEN: Respiratory Disease Classification using Ensemble Learning
by: Nadkarni, Rahul, et al.
Published: (2024)

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
by: Wang, Wenbin, et al.
Published: (2024)

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey
by: Xie, Tianxin, et al.
Published: (2024)

TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment
by: Kim, Taesoo, et al.
Published: (2025)

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
by: Ju, Zeqian, et al.
Published: (2024)

An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition
by: Wang, Yi-Cheng, et al.
Published: (2024)

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
by: Seth, Ashish, et al.
Published: (2024)

Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
by: Guo, Zilu, et al.
Published: (2023)

EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models
by: Yao, Wenhan, et al.
Published: (2024)

Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
by: Gómez-Zaragozá, Lucía, et al.
Published: (2023)

Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments
by: Djeffal, Noussaiba, et al.
Published: (2025)

Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
by: Liu, Zhijun, et al.
Published: (2024)

Real-time Speech Summarization for Medical Conversations
by: Le-Duc, Khai, et al.
Published: (2024)