Enregistré dans:
Détails bibliographiques
Auteurs principaux: Khalak, Abdulmuizz, Issam, Abderrahmane, Spanakis, Gerasimos
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2602.09826
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866911555193405440
author Khalak, Abdulmuizz
Issam, Abderrahmane
Spanakis, Gerasimos
author_facet Khalak, Abdulmuizz
Issam, Abderrahmane
Spanakis, Gerasimos
contents Arabic Language Models (LMs) are pretrained predominately on Modern Standard Arabic (MSA) and are expected to transfer to its dialects. While MSA as the standard written variety is commonly used in formal settings, people speak and write online in various dialects that are spread across the Arab region. This poses limitations for Arabic LMs, since its dialects vary in their similarity to MSA. In this work we study cross-lingual transfer of Arabic models using probing on 3 Natural Language Processing (NLP) Tasks, and representational similarity. Our results indicate that transfer is possible but disproportionate across dialects, which we find to be partially explained by their geographic proximity. Furthermore, we find evidence for negative interference in models trained to support all Arabic dialects. This questions their degree of similarity, and raises concerns for cross-lingual transfer in Arabic models.
format Preprint
id arxiv_https___arxiv_org_abs_2602_09826
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle From FusHa to Folk: Exploring Cross-Lingual Transfer in Arabic Language Models
Khalak, Abdulmuizz
Issam, Abderrahmane
Spanakis, Gerasimos
Computation and Language
Arabic Language Models (LMs) are pretrained predominately on Modern Standard Arabic (MSA) and are expected to transfer to its dialects. While MSA as the standard written variety is commonly used in formal settings, people speak and write online in various dialects that are spread across the Arab region. This poses limitations for Arabic LMs, since its dialects vary in their similarity to MSA. In this work we study cross-lingual transfer of Arabic models using probing on 3 Natural Language Processing (NLP) Tasks, and representational similarity. Our results indicate that transfer is possible but disproportionate across dialects, which we find to be partially explained by their geographic proximity. Furthermore, we find evidence for negative interference in models trained to support all Arabic dialects. This questions their degree of similarity, and raises concerns for cross-lingual transfer in Arabic models.
title From FusHa to Folk: Exploring Cross-Lingual Transfer in Arabic Language Models
topic Computation and Language
url https://arxiv.org/abs/2602.09826