Saved in:
Bibliographic Details
Main Authors: Das, Paramita, Roy, Amartya, Chakraborty, Ritabrata, Mukherjee, Animesh
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2412.05708
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912149149843456
author Das, Paramita
Roy, Amartya
Chakraborty, Ritabrata
Mukherjee, Animesh
author_facet Das, Paramita
Roy, Amartya
Chakraborty, Ritabrata
Mukherjee, Animesh
contents Although Wikipedia is the largest multilingual encyclopedia, it remains inherently incomplete. There is a significant disparity in the quality of content between high-resource languages (HRLs, e.g., English) and low-resource languages (LRLs, e.g., Hindi), with many LRL articles lacking adequate information. To bridge these content gaps, we propose a lightweight framework to enhance knowledge equity between English and Hindi. In case the English Wikipedia page is not up-to-date, our framework extracts relevant information from external resources readily available (such as English books) and adapts it to align with Wikipedia's distinctive style, including its \textit{neutral point of view} (NPOV) policy, using in-context learning capabilities of large language models. The adapted content is then machine-translated into Hindi for integration into the corresponding Wikipedia articles. On the other hand, if the English version is comprehensive and up-to-date, the framework directly transfers knowledge from English to Hindi. Our framework effectively generates new content for Hindi Wikipedia sections, enhancing Hindi Wikipedia articles respectively by 65% and 62% according to automatic and human judgment-based evaluations.
format Preprint
id arxiv_https___arxiv_org_abs_2412_05708
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle On the effective transfer of knowledge from English to Hindi Wikipedia
Das, Paramita
Roy, Amartya
Chakraborty, Ritabrata
Mukherjee, Animesh
Computation and Language
Information Retrieval
Machine Learning
Although Wikipedia is the largest multilingual encyclopedia, it remains inherently incomplete. There is a significant disparity in the quality of content between high-resource languages (HRLs, e.g., English) and low-resource languages (LRLs, e.g., Hindi), with many LRL articles lacking adequate information. To bridge these content gaps, we propose a lightweight framework to enhance knowledge equity between English and Hindi. In case the English Wikipedia page is not up-to-date, our framework extracts relevant information from external resources readily available (such as English books) and adapts it to align with Wikipedia's distinctive style, including its \textit{neutral point of view} (NPOV) policy, using in-context learning capabilities of large language models. The adapted content is then machine-translated into Hindi for integration into the corresponding Wikipedia articles. On the other hand, if the English version is comprehensive and up-to-date, the framework directly transfers knowledge from English to Hindi. Our framework effectively generates new content for Hindi Wikipedia sections, enhancing Hindi Wikipedia articles respectively by 65% and 62% according to automatic and human judgment-based evaluations.
title On the effective transfer of knowledge from English to Hindi Wikipedia
topic Computation and Language
Information Retrieval
Machine Learning
url https://arxiv.org/abs/2412.05708