Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Tzu-Hsuan, He, Yue-Yang, Chen, Berlin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2506.19315
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913959549861888
author	Yang, Tzu-Hsuan He, Yue-Yang Chen, Berlin
author_facet	Yang, Tzu-Hsuan He, Yue-Yang Chen, Berlin
contents	Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_19315
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	JCAPT: A Joint Modeling Approach for CAPT Yang, Tzu-Hsuan He, Yue-Yang Chen, Berlin Computation and Language Artificial Intelligence Audio and Speech Processing Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task.
title	JCAPT: A Joint Modeling Approach for CAPT
topic	Computation and Language Artificial Intelligence Audio and Speech Processing
url	https://arxiv.org/abs/2506.19315

Similar Items