Saved in:
Bibliographic Details
Main Authors: Emezue, Chris, Community, NaijaVoices, Awobade, Busayo, Owodunni, Abraham, Emezue, Handel, Emezue, Gloria Monica Tobechukwu, Emezue, Nefertiti Nneoma, Ogun, Sewade, Akinremi, Bunmi, Adelani, David Ifeoluwa, Pal, Chris
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.20564
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911052431622144
author Emezue, Chris
Community, NaijaVoices
Awobade, Busayo
Owodunni, Abraham
Emezue, Handel
Emezue, Gloria Monica Tobechukwu
Emezue, Nefertiti Nneoma
Ogun, Sewade
Akinremi, Bunmi
Adelani, David Ifeoluwa
Pal, Chris
author_facet Emezue, Chris
Community, NaijaVoices
Awobade, Busayo
Owodunni, Abraham
Emezue, Handel
Emezue, Gloria Monica Tobechukwu
Emezue, Nefertiti Nneoma
Ogun, Sewade
Akinremi, Bunmi
Adelani, David Ifeoluwa
Pal, Chris
contents The development of high-performing, robust, and reliable speech technologies depends on large, high-quality datasets. However, African languages -- including our focus, Igbo, Hausa, and Yoruba -- remain under-represented due to insufficient data. Popular voice-enabled technologies do not support any of the 2000+ African languages, limiting accessibility for circa one billion people. While previous dataset efforts exist for the target languages, they lack the scale and diversity needed for robust speech models. To bridge this gap, we introduce the NaijaVoices dataset, a 1,800-hour speech-text dataset with 5,000+ speakers. We outline our unique data collection approach, analyze its acoustic diversity, and demonstrate its impact through finetuning experiments on automatic speech recognition, averagely achieving 75.86% (Whisper), 52.06% (MMS), and 42.33% (XLSR) WER improvements. These results highlight NaijaVoices' potential to advance multilingual speech processing for African languages.
format Preprint
id arxiv_https___arxiv_org_abs_2505_20564
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages
Emezue, Chris
Community, NaijaVoices
Awobade, Busayo
Owodunni, Abraham
Emezue, Handel
Emezue, Gloria Monica Tobechukwu
Emezue, Nefertiti Nneoma
Ogun, Sewade
Akinremi, Bunmi
Adelani, David Ifeoluwa
Pal, Chris
Computation and Language
The development of high-performing, robust, and reliable speech technologies depends on large, high-quality datasets. However, African languages -- including our focus, Igbo, Hausa, and Yoruba -- remain under-represented due to insufficient data. Popular voice-enabled technologies do not support any of the 2000+ African languages, limiting accessibility for circa one billion people. While previous dataset efforts exist for the target languages, they lack the scale and diversity needed for robust speech models. To bridge this gap, we introduce the NaijaVoices dataset, a 1,800-hour speech-text dataset with 5,000+ speakers. We outline our unique data collection approach, analyze its acoustic diversity, and demonstrate its impact through finetuning experiments on automatic speech recognition, averagely achieving 75.86% (Whisper), 52.06% (MMS), and 42.33% (XLSR) WER improvements. These results highlight NaijaVoices' potential to advance multilingual speech processing for African languages.
title The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages
topic Computation and Language
url https://arxiv.org/abs/2505.20564