:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Parikh, Aditya Kamlesh, Tejedor-Garcia, Cristian, Cucchiarini, Catia, Strik, Helmer
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.02080
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating Logit-Based GOP Scores for Mispronunciation Detection
by: Parikh, Aditya Kamlesh, et al.
Published: (2025)

Rubric-Guided Fine-tuning of SpeechLLMs for Multi-Aspect, Multi-Rater L2 Reading-Speech Assessment
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)

Zero-Shot Speech LLMs for Multi-Aspect Evaluation of L2 Speech: Challenges and Opportunities
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)

Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts
by: Gao, Lingyun, et al.
Published: (2025)

Reading Miscue Detection in Primary School through Automatic Speech Recognition
by: Gao, Lingyun, et al.
Published: (2024)

Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech
by: Lathouwers, Gus, et al.
Published: (2026)

Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics
by: Molenaar, Bo, et al.
Published: (2023)

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
by: Wills, Simone, et al.
Published: (2023)

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders
by: Bai, Yu, et al.
Published: (2023)

Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
by: Postma, Emmy, et al.
Published: (2025)

Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models
by: Lee, Kyowoon, et al.
Published: (2025)

Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
by: Gómez-Zaragozá, Lucía, et al.
Published: (2023)

CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment
by: Liu, Hanwen, et al.
Published: (2026)

Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)

CTC-Assisted LLM-Based Contextual ASR
by: Yang, Guanrou, et al.
Published: (2024)

CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
by: Zhao, Rui, et al.
Published: (2024)

Innovative Speech-Based Deep Learning Approaches for Parkinson's Disease Classification: A Systematic Review
by: van Gelderen, Lisanne, et al.
Published: (2024)

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
by: Peng, Jing, et al.
Published: (2026)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting
by: Jin, Sichen, et al.
Published: (2024)

Phonology-Guided Speech-to-Speech Translation for African Languages
by: Ochieng, Peter, et al.
Published: (2024)

RECA-PD: A Robust Explainable Cross-Attention Method for Speech-based Parkinson's Disease Classification
by: Zhong, Terry Yi, et al.
Published: (2025)

Scenario of Use Scheme: Threat Model Specification for Speaker Privacy Protection in the Medical Domain
by: Rahman, Mehtab Ur, et al.
Published: (2024)

A Benchmark for Early-stage Parkinson's Disease Detection from Speech
by: Zhong, Terry Yi, et al.
Published: (2026)

PhonologyBench: Evaluating Phonological Skills of Large Language Models
by: Suvarna, Ashima, et al.
Published: (2024)

Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers
by: Zhong, Terry Yi, et al.
Published: (2025)

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
by: Grigoryan, Lilit, et al.
Published: (2025)

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
by: Andrusenko, Andrei, et al.
Published: (2024)

Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR
by: Lu, Xugang, et al.
Published: (2025)

Less Peaky and More Accurate CTC Forced Alignment by Label Priors
by: Huang, Ruizhe, et al.
Published: (2024)

Audio Deepfake Detection in the Age of Advanced Text-to-Speech models
by: Singh, Robin, et al.
Published: (2026)

Adaptive Knowledge Distillation for Device-Directed Speech Detection
by: Chi, Hyung Gun, et al.
Published: (2025)

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)

Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach
by: Tu, Huu Tuong, et al.
Published: (2025)

Replay Attacks Against Audio Deepfake Detection
by: Müller, Nicolas, et al.
Published: (2025)

Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
by: Dawn, Aditya, et al.
Published: (2024)

Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis
by: Wang, Xintong, et al.
Published: (2024)

Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis
by: Geng, Haopeng, et al.
Published: (2026)

LV-CTC: Non-autoregressive ASR with CTC and latent variable models
by: Fujita, Yuya, et al.
Published: (2024)

Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning
by: Lachenani, Sidahmed, et al.
Published: (2025)