Saved in:
Bibliographic Details
Main Authors: Yang, Tzu-Hsuan, He, Yue-Yang, Chen, Berlin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.19315
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913959549861888
author Yang, Tzu-Hsuan
He, Yue-Yang
Chen, Berlin
author_facet Yang, Tzu-Hsuan
He, Yue-Yang
Chen, Berlin
contents Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task.
format Preprint
id arxiv_https___arxiv_org_abs_2506_19315
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle JCAPT: A Joint Modeling Approach for CAPT
Yang, Tzu-Hsuan
He, Yue-Yang
Chen, Berlin
Computation and Language
Artificial Intelligence
Audio and Speech Processing
Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task.
title JCAPT: A Joint Modeling Approach for CAPT
topic Computation and Language
Artificial Intelligence
Audio and Speech Processing
url https://arxiv.org/abs/2506.19315