Inhaltsangabe: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Saranya S, Sreeja K, Bharathi B
Format:	Recurso digital
Sprache:	Englisch
Veröffentlicht:	Zenodo 2025
Schlagworte:	Fine-tuning, Gaussian Mixture model, MFCC, OpenAI, Transfer learning, Tamil dialect speech corpus, Word error rate
Online-Zugang:	https://doi.org/10.5281/zenodo.16258256
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Inhaltsangabe:

Abstract Tamil is a classical language that has a very ancient history and a distant and sumptuous literary grammar system. In any language, the way the people of a certain region express themselves is described as a dialect. Dialects are of great significance in the analysis of a language especially in the dialect identification systems mainly because most language dialects are extremely similar to each other. However, dialects cannot even be regarded as distinct and separate entities since they are tied to regional and often cultural characteristics. A survey of the available speech databases revealed that today there is no standard speech database collected for the Tamil dialects in the field of speech processing. Thus, to fill in the gap for the proposed system, efforts will be directed on creating a speech corpus that will facilitate dialect recognition for Tamil. This paper focuses on three main dialects of the Tamil language: Southern, Northern, and Western. In the present study, the process of transcribing spoken Tamil into written Tamil involves identifying words based on a fine-tuned model of the OpenAI Whisper model trained on the developed corpus. Notes: In this study, the Word Error Rate (WER) is applied to measure the performance of the developed system.

Ähnliche Einträge