Gespeichert in:
| Hauptverfasser: | , , |
|---|---|
| Format: | Recurso digital |
| Sprache: | Englisch |
| Veröffentlicht: |
Zenodo
2025
|
| Schlagworte: | |
| Online-Zugang: | https://doi.org/10.5281/zenodo.16258256 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Inhaltsangabe:
- <p><strong><span><span lang="EN-GB">Abstract</span></span></strong><span lang="EN-GB"> </span></p> <p><span><span lang="EN-AU">Tamil is a classical language that has a very ancient history and a distant and sumptuous literary grammar system. In any language, the way the people of a certain region express themselves is described as a dialect. Dialects are of great significance in the analysis of a language especially in the dialect identification systems mainly because most language dialects are extremely similar to each other. However, dialects cannot even be regarded as distinct and separate entities since they are tied to regional and often cultural characteristics. A survey of the available speech databases revealed that today there is no standard speech database collected for the Tamil dialects in the field of speech processing. Thus, to fill in the gap for the proposed system, efforts will be directed on creating a speech corpus that will facilitate dialect recognition for Tamil. This paper focuses on three main dialects of the Tamil language: Southern, Northern, and Western. In the present study, the process of transcribing spoken Tamil into written Tamil involves identifying words based on a fine-tuned model of the OpenAI Whisper model trained on the developed corpus. Notes: In this study, the Word Error Rate (WER) is applied to measure the performance of the developed system.</span></span></p>