Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.03300 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915477841772544 |
|---|---|
| author | Chary, Luis Felipe Ramirez, Miguel Arjona |
| author_facet | Chary, Luis Felipe Ramirez, Miguel Arjona |
| contents | Grapheme-to-phoneme (G2P) conversion is a key front-end for text-to-speech (TTS), automatic speech recognition (ASR), speech-to-speech translation (S2ST) and alignment systems, especially across multiple Latin-script languages.We present LatPhon, a 7.5 M - parameter Transformer jointly trained on six such languages--English, Spanish, French, Italian, Portuguese, and Romanian. On the public ipa-dict corpus, it attains a mean phoneme error rate (PER) of 3.5%, outperforming the byte-level ByT5 baseline (5.4%) and approaching language-specific WFSTs (3.2%) while occupying 30 MB of memory, which makes on-device deployment feasible when needed. These results indicate that compact multilingual G2P can serve as a universal front-end for Latin-language speech pipelines. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2509_03300 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | LatPhon: Lightweight Multilingual G2P for Romance Languages and English Chary, Luis Felipe Ramirez, Miguel Arjona Computation and Language Grapheme-to-phoneme (G2P) conversion is a key front-end for text-to-speech (TTS), automatic speech recognition (ASR), speech-to-speech translation (S2ST) and alignment systems, especially across multiple Latin-script languages.We present LatPhon, a 7.5 M - parameter Transformer jointly trained on six such languages--English, Spanish, French, Italian, Portuguese, and Romanian. On the public ipa-dict corpus, it attains a mean phoneme error rate (PER) of 3.5%, outperforming the byte-level ByT5 baseline (5.4%) and approaching language-specific WFSTs (3.2%) while occupying 30 MB of memory, which makes on-device deployment feasible when needed. These results indicate that compact multilingual G2P can serve as a universal front-end for Latin-language speech pipelines. |
| title | LatPhon: Lightweight Multilingual G2P for Romance Languages and English |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2509.03300 |