Saved in:
Bibliographic Details
Main Author: Mbengue, Ndeye-Emilie
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.05931
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910197571649536
author Mbengue, Ndeye-Emilie
author_facet Mbengue, Ndeye-Emilie
contents Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from participating in the global digital transformation. In this PhD proposal, we aim to address this gap, focusing on the language coverage of Linked Open Data knowledge graphs (LOD KGs). First, we identify key variables that characterize language distribution in LOD, including the number of Wikipedia articles per language edition and the number of language-tagged entities in LOD KGs. These variables are analyzed across three major multilingual LOD KGs, DBpedia, BabelNet, and Wikidata, providing insights into the representation and distribution of languages within LOD. Building on this analysis, we intend to study the impact of cross-lingual transfer candidate selection on the task of multilingual KG completion. In particular, we plan to investigate strategies based on linguistic proximity and the availability of curated annotated alignments between languages. Language proximity also motivates us to explore the benefits of analogical reasoning that relies on (dis)similarities and has not yet been investigated to identify correspondences across languages to improve KG completion performance and enhance language coverage in LOD.
format Preprint
id arxiv_https___arxiv_org_abs_2605_05931
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs
Mbengue, Ndeye-Emilie
Artificial Intelligence
Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from participating in the global digital transformation. In this PhD proposal, we aim to address this gap, focusing on the language coverage of Linked Open Data knowledge graphs (LOD KGs). First, we identify key variables that characterize language distribution in LOD, including the number of Wikipedia articles per language edition and the number of language-tagged entities in LOD KGs. These variables are analyzed across three major multilingual LOD KGs, DBpedia, BabelNet, and Wikidata, providing insights into the representation and distribution of languages within LOD. Building on this analysis, we intend to study the impact of cross-lingual transfer candidate selection on the task of multilingual KG completion. In particular, we plan to investigate strategies based on linguistic proximity and the availability of curated annotated alignments between languages. Language proximity also motivates us to explore the benefits of analogical reasoning that relies on (dis)similarities and has not yet been investigated to identify correspondences across languages to improve KG completion performance and enhance language coverage in LOD.
title In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs
topic Artificial Intelligence
url https://arxiv.org/abs/2605.05931