Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Singhal, Aryan, Shao, Veronica, Sun, Gary, Ding, Ryan, Lu, Jonathan, Zhu, Kevin
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2410.10303
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866929541005443072
author	Singhal, Aryan Shao, Veronica Sun, Gary Ding, Ryan Lu, Jonathan Zhu, Kevin
author_facet	Singhal, Aryan Shao, Veronica Sun, Gary Ding, Ryan Lu, Jonathan Zhu, Kevin
contents	The rise of digital misinformation has heightened interest in using multilingual Large Language Models (LLMs) for fact-checking. This study systematically evaluates translation bias and the effectiveness of LLMs for cross-lingual claim verification across 15 languages from five language families: Romance, Slavic, Turkic, Indo-Aryan, and Kartvelian. Using the XFACT dataset to assess their impact on accuracy and bias, we investigate two distinct translation methods: pre-translation and self-translation. We use mBERT's performance on the English dataset as a baseline to compare language-specific accuracies. Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation in the training data. Furthermore, larger models demonstrate superior performance in self-translation, improving translation accuracy and reducing bias. These results highlight the need for balanced multilingual training, especially in low-resource languages, to promote equitable access to reliable fact-checking tools and minimize the risk of spreading misinformation in different linguistic contexts.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_10303
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification Singhal, Aryan Shao, Veronica Sun, Gary Ding, Ryan Lu, Jonathan Zhu, Kevin Computation and Language The rise of digital misinformation has heightened interest in using multilingual Large Language Models (LLMs) for fact-checking. This study systematically evaluates translation bias and the effectiveness of LLMs for cross-lingual claim verification across 15 languages from five language families: Romance, Slavic, Turkic, Indo-Aryan, and Kartvelian. Using the XFACT dataset to assess their impact on accuracy and bias, we investigate two distinct translation methods: pre-translation and self-translation. We use mBERT's performance on the English dataset as a baseline to compare language-specific accuracies. Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation in the training data. Furthermore, larger models demonstrate superior performance in self-translation, improving translation accuracy and reducing bias. These results highlight the need for balanced multilingual training, especially in low-resource languages, to promote equitable access to reliable fact-checking tools and minimize the risk of spreading misinformation in different linguistic contexts.
title	A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification
topic	Computation and Language
url	https://arxiv.org/abs/2410.10303

Ähnliche Einträge