Saved in:
Bibliographic Details
Main Authors: Parikh, Aditya Kamlesh, Tejedor-Garcia, Cristian, Cucchiarini, Catia, Strik, Helmer
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.02080
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916924557885440
author Parikh, Aditya Kamlesh
Tejedor-Garcia, Cristian
Cucchiarini, Catia
Strik, Helmer
author_facet Parikh, Aditya Kamlesh
Tejedor-Garcia, Cristian
Cucchiarini, Catia
Strik, Helmer
contents Computer-Assisted Pronunciation Training (CAPT) systems employ automatic measures of pronunciation quality, such as the goodness of pronunciation (GOP) metric. GOP relies on forced alignments, which are prone to labeling and segmentation errors due to acoustic variability. While alignment-free methods address these challenges, they are computationally expensive and scale poorly with phoneme sequence length and inventory size. To enhance efficiency, we introduce a substitution-aware alignment-free GOP that restricts phoneme substitutions based on phoneme clusters and common learner errors. We evaluated our GOP on two L2 English speech datasets, one with child speech, My Pronunciation Coach (MPC), and SpeechOcean762, which includes child and adult speech. We compared RPS (restricted phoneme substitutions) and UPS (unrestricted phoneme substitutions) setups within alignment-free methods, which outperformed the baseline. We discuss our results and outline avenues for future research.
format Preprint
id arxiv_https___arxiv_org_abs_2506_02080
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge
Parikh, Aditya Kamlesh
Tejedor-Garcia, Cristian
Cucchiarini, Catia
Strik, Helmer
Audio and Speech Processing
Artificial Intelligence
Computer-Assisted Pronunciation Training (CAPT) systems employ automatic measures of pronunciation quality, such as the goodness of pronunciation (GOP) metric. GOP relies on forced alignments, which are prone to labeling and segmentation errors due to acoustic variability. While alignment-free methods address these challenges, they are computationally expensive and scale poorly with phoneme sequence length and inventory size. To enhance efficiency, we introduce a substitution-aware alignment-free GOP that restricts phoneme substitutions based on phoneme clusters and common learner errors. We evaluated our GOP on two L2 English speech datasets, one with child speech, My Pronunciation Coach (MPC), and SpeechOcean762, which includes child and adult speech. We compared RPS (restricted phoneme substitutions) and UPS (unrestricted phoneme substitutions) setups within alignment-free methods, which outperformed the baseline. We discuss our results and outline avenues for future research.
title Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge
topic Audio and Speech Processing
Artificial Intelligence
url https://arxiv.org/abs/2506.02080