Saved in:
Bibliographic Details
Main Authors: Emezue, Chris Chinenye, Okoh, Ifeoma, Mbonu, Chinedu, Chukwuneke, Chiamaka, Lal, Daisy, Ezeani, Ignatius, Rayson, Paul, Onwuzulike, Ijemma, Okeke, Chukwuma, Nweya, Gerald, Ogbonna, Bright, Oraegbunam, Chukwuebuka, Awo-Ndubuisi, Esther Chidinma, Osuagwu, Akudo Amarachukwu, Nmezi, Obioha
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.00997
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910431637929984
author Emezue, Chris Chinenye
Okoh, Ifeoma
Mbonu, Chinedu
Chukwuneke, Chiamaka
Lal, Daisy
Ezeani, Ignatius
Rayson, Paul
Onwuzulike, Ijemma
Okeke, Chukwuma
Nweya, Gerald
Ogbonna, Bright
Oraegbunam, Chukwuebuka
Awo-Ndubuisi, Esther Chidinma
Osuagwu, Akudo Amarachukwu
Nmezi, Obioha
author_facet Emezue, Chris Chinenye
Okoh, Ifeoma
Mbonu, Chinedu
Chukwuneke, Chiamaka
Lal, Daisy
Ezeani, Ignatius
Rayson, Paul
Onwuzulike, Ijemma
Okeke, Chukwuma
Nweya, Gerald
Ogbonna, Bright
Oraegbunam, Chukwuebuka
Awo-Ndubuisi, Esther Chidinma
Osuagwu, Akudo Amarachukwu
Nmezi, Obioha
contents The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study. This highlights the need to develop language technologies for Igbo to foster communication, learning and preservation. To create robust, impactful, and widely adopted language technologies for Igbo, it is essential to incorporate the multi-dialectal nature of the language. The primary obstacle in achieving dialectal-aware language technologies is the lack of comprehensive dialectal datasets. In response, we present the IgboAPI dataset, a multi-dialectal Igbo-English dictionary dataset, developed with the aim of enhancing the representation of Igbo dialects. Furthermore, we illustrate the practicality of the IgboAPI dataset through two distinct studies: one focusing on Igbo semantic lexicon and the other on machine translation. In the semantic lexicon project, we successfully establish an initial Igbo semantic lexicon for the Igbo semantic tagger, while in the machine translation study, we demonstrate that by finetuning existing machine translation systems using the IgboAPI dataset, we significantly improve their ability to handle dialectal variations in sentences.
format Preprint
id arxiv_https___arxiv_org_abs_2405_00997
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment
Emezue, Chris Chinenye
Okoh, Ifeoma
Mbonu, Chinedu
Chukwuneke, Chiamaka
Lal, Daisy
Ezeani, Ignatius
Rayson, Paul
Onwuzulike, Ijemma
Okeke, Chukwuma
Nweya, Gerald
Ogbonna, Bright
Oraegbunam, Chukwuebuka
Awo-Ndubuisi, Esther Chidinma
Osuagwu, Akudo Amarachukwu
Nmezi, Obioha
Computation and Language
The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study. This highlights the need to develop language technologies for Igbo to foster communication, learning and preservation. To create robust, impactful, and widely adopted language technologies for Igbo, it is essential to incorporate the multi-dialectal nature of the language. The primary obstacle in achieving dialectal-aware language technologies is the lack of comprehensive dialectal datasets. In response, we present the IgboAPI dataset, a multi-dialectal Igbo-English dictionary dataset, developed with the aim of enhancing the representation of Igbo dialects. Furthermore, we illustrate the practicality of the IgboAPI dataset through two distinct studies: one focusing on Igbo semantic lexicon and the other on machine translation. In the semantic lexicon project, we successfully establish an initial Igbo semantic lexicon for the Igbo semantic tagger, while in the machine translation study, we demonstrate that by finetuning existing machine translation systems using the IgboAPI dataset, we significantly improve their ability to handle dialectal variations in sentences.
title The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment
topic Computation and Language
url https://arxiv.org/abs/2405.00997