Saved in:
| Main Authors: | , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.07203 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908582027460608 |
|---|---|
| author | Akera, Benjamin Ouma, Evelyn Nafula Yiga, Gilbert Walukagga, Patrick Natukunda, Phionah Saaka, Trevor Nsumba, Solomon Nabukeera, Lilian Teddy Muhanguzi, Joel Sekalala, Imran Namara, Nimpamya Janat Bainomugisha, Engineer Mwebaze, Ernest Quinn, John |
| author_facet | Akera, Benjamin Ouma, Evelyn Nafula Yiga, Gilbert Walukagga, Patrick Natukunda, Phionah Saaka, Trevor Nsumba, Solomon Nabukeera, Lilian Teddy Muhanguzi, Joel Sekalala, Imran Namara, Nimpamya Janat Bainomugisha, Engineer Mwebaze, Ernest Quinn, John |
| contents | There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading LLMs exhibit strong performance on a number of the most common languages (e.g. Swahili or Yoruba), but prioritise support for the languages with the most speakers first, resulting in piecemeal ability across disparate languages. We contend that a regionally focussed approach is more efficient, and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, a pair of models based on Qwen 3 with state of the art comprehension in the majority of all Ugandan languages. These models are open source and can be used to reduce language barriers in a number of important practical applications. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2510_07203 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models Akera, Benjamin Ouma, Evelyn Nafula Yiga, Gilbert Walukagga, Patrick Natukunda, Phionah Saaka, Trevor Nsumba, Solomon Nabukeera, Lilian Teddy Muhanguzi, Joel Sekalala, Imran Namara, Nimpamya Janat Bainomugisha, Engineer Mwebaze, Ernest Quinn, John Computation and Language There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading LLMs exhibit strong performance on a number of the most common languages (e.g. Swahili or Yoruba), but prioritise support for the languages with the most speakers first, resulting in piecemeal ability across disparate languages. We contend that a regionally focussed approach is more efficient, and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, a pair of models based on Qwen 3 with state of the art comprehension in the majority of all Ugandan languages. These models are open source and can be used to reduce language barriers in a number of important practical applications. |
| title | Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2510.07203 |