Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.19315 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913959549861888 |
|---|---|
| author | Yang, Tzu-Hsuan He, Yue-Yang Chen, Berlin |
| author_facet | Yang, Tzu-Hsuan He, Yue-Yang Chen, Berlin |
| contents | Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2506_19315 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | JCAPT: A Joint Modeling Approach for CAPT Yang, Tzu-Hsuan He, Yue-Yang Chen, Berlin Computation and Language Artificial Intelligence Audio and Speech Processing Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task. |
| title | JCAPT: A Joint Modeling Approach for CAPT |
| topic | Computation and Language Artificial Intelligence Audio and Speech Processing |
| url | https://arxiv.org/abs/2506.19315 |