Saved in:
Bibliographic Details
Main Authors: Akera, Benjamin, Ouma, Evelyn Nafula, Yiga, Gilbert, Walukagga, Patrick, Natukunda, Phionah, Saaka, Trevor, Nsumba, Solomon, Nabukeera, Lilian Teddy, Muhanguzi, Joel, Sekalala, Imran, Namara, Nimpamya Janat, Bainomugisha, Engineer, Mwebaze, Ernest, Quinn, John
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.07203
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908582027460608
author Akera, Benjamin
Ouma, Evelyn Nafula
Yiga, Gilbert
Walukagga, Patrick
Natukunda, Phionah
Saaka, Trevor
Nsumba, Solomon
Nabukeera, Lilian Teddy
Muhanguzi, Joel
Sekalala, Imran
Namara, Nimpamya Janat
Bainomugisha, Engineer
Mwebaze, Ernest
Quinn, John
author_facet Akera, Benjamin
Ouma, Evelyn Nafula
Yiga, Gilbert
Walukagga, Patrick
Natukunda, Phionah
Saaka, Trevor
Nsumba, Solomon
Nabukeera, Lilian Teddy
Muhanguzi, Joel
Sekalala, Imran
Namara, Nimpamya Janat
Bainomugisha, Engineer
Mwebaze, Ernest
Quinn, John
contents There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading LLMs exhibit strong performance on a number of the most common languages (e.g. Swahili or Yoruba), but prioritise support for the languages with the most speakers first, resulting in piecemeal ability across disparate languages. We contend that a regionally focussed approach is more efficient, and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, a pair of models based on Qwen 3 with state of the art comprehension in the majority of all Ugandan languages. These models are open source and can be used to reduce language barriers in a number of important practical applications.
format Preprint
id arxiv_https___arxiv_org_abs_2510_07203
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models
Akera, Benjamin
Ouma, Evelyn Nafula
Yiga, Gilbert
Walukagga, Patrick
Natukunda, Phionah
Saaka, Trevor
Nsumba, Solomon
Nabukeera, Lilian Teddy
Muhanguzi, Joel
Sekalala, Imran
Namara, Nimpamya Janat
Bainomugisha, Engineer
Mwebaze, Ernest
Quinn, John
Computation and Language
There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading LLMs exhibit strong performance on a number of the most common languages (e.g. Swahili or Yoruba), but prioritise support for the languages with the most speakers first, resulting in piecemeal ability across disparate languages. We contend that a regionally focussed approach is more efficient, and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, a pair of models based on Qwen 3 with state of the art comprehension in the majority of all Ugandan languages. These models are open source and can be used to reduce language barriers in a number of important practical applications.
title Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models
topic Computation and Language
url https://arxiv.org/abs/2510.07203