Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Pathak, Utkarsh, Gunda, Chandra Sai Krishna, Prakash, Anusha, Agarwal, Keshav, Murthy, Hema A.
Formato:	Preprint
Publicado:	2025
Materias:	Computation and Language Computer Vision and Pattern Recognition I.5.4
Acceso en línea:	https://arxiv.org/abs/2506.03884
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866915324932128768
author	Pathak, Utkarsh Gunda, Chandra Sai Krishna Prakash, Anusha Agarwal, Keshav Murthy, Hema A.
author_facet	Pathak, Utkarsh Gunda, Chandra Sai Krishna Prakash, Anusha Agarwal, Keshav Murthy, Hema A.
contents	Text-to-speech (TTS) systems typically require high-quality studio data and accurate transcriptions for training. India has 1369 languages, with 22 official using 13 scripts. Training a TTS system for all these languages, most of which have no digital resources, seems a Herculean task. Our work focuses on zero-shot synthesis, particularly for languages whose scripts and phonotactics come from different families. The novelty of our work is in the augmentation of a shared phone representation and modifying the text parsing rules to match the phonotactics of the target language, thus reducing the synthesiser overhead and enabling rapid adaptation. Intelligible and natural speech was generated for Sanskrit, Maharashtrian and Canara Konkani, Maithili and Kurukh by leveraging linguistic connections across languages with suitable synthesisers. Evaluations confirm the effectiveness of this approach, highlighting its potential to expand speech technology access for under-represented languages.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_03884
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages Pathak, Utkarsh Gunda, Chandra Sai Krishna Prakash, Anusha Agarwal, Keshav Murthy, Hema A. Computation and Language Computer Vision and Pattern Recognition I.5.4 Text-to-speech (TTS) systems typically require high-quality studio data and accurate transcriptions for training. India has 1369 languages, with 22 official using 13 scripts. Training a TTS system for all these languages, most of which have no digital resources, seems a Herculean task. Our work focuses on zero-shot synthesis, particularly for languages whose scripts and phonotactics come from different families. The novelty of our work is in the augmentation of a shared phone representation and modifying the text parsing rules to match the phonotactics of the target language, thus reducing the synthesiser overhead and enabling rapid adaptation. Intelligible and natural speech was generated for Sanskrit, Maharashtrian and Canara Konkani, Maithili and Kurukh by leveraging linguistic connections across languages with suitable synthesisers. Evaluations confirm the effectiveness of this approach, highlighting its potential to expand speech technology access for under-represented languages.
title	Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages
topic	Computation and Language Computer Vision and Pattern Recognition I.5.4
url	https://arxiv.org/abs/2506.03884

Ejemplares similares