_version_ 1866908561884315648
author Alwajih, Fakhraddin
Magdy, Samar M.
Mekki, Abdellah El
Nacar, Omer
Nafea, Youssef
Abdelfadil, Safaa Taher
Yahya, Abdulfattah Mohammed
Luqman, Hamzah
Almarwani, Nada
Aloufi, Samah
Qawasmen, Baraah
Atou, Houdaifa
Sibaee, Serry
Alsayadi, Hamzah A.
Al-Dhabyani, Walid
Al-shaibani, Maged S.
Aatar, Aya El
Qandos, Nour
Alhamouri, Rahaf
Ahmad, Samar
Al-Ghrawi, Mohammed Anwar
Yacoub, Aminetou
AbuHweidi, Ruwa
Lemin, Vatimetou Mohamed
Abdel-Salam, Reem
Bashiti, Ahlam
Alansari, Aisha
Ashraf, Ahmed
Alturayeif, Nora
Inciarte, Alcides Alcoba
Ammar, Adel
Elmadany, Abdelrahim A.
Tourad, Mohamedou Cheikh
Berrada, Ismail
Jarrar, Mustafa
Shehata, Shady
Abdul-Mageed, Muhammad
author_facet Alwajih, Fakhraddin
Magdy, Samar M.
Mekki, Abdellah El
Nacar, Omer
Nafea, Youssef
Abdelfadil, Safaa Taher
Yahya, Abdulfattah Mohammed
Luqman, Hamzah
Almarwani, Nada
Aloufi, Samah
Qawasmen, Baraah
Atou, Houdaifa
Sibaee, Serry
Alsayadi, Hamzah A.
Al-Dhabyani, Walid
Al-shaibani, Maged S.
Aatar, Aya El
Qandos, Nour
Alhamouri, Rahaf
Ahmad, Samar
Al-Ghrawi, Mohammed Anwar
Yacoub, Aminetou
AbuHweidi, Ruwa
Lemin, Vatimetou Mohamed
Abdel-Salam, Reem
Bashiti, Ahlam
Alansari, Aisha
Ashraf, Ahmed
Alturayeif, Nora
Inciarte, Alcides Alcoba
Ammar, Adel
Elmadany, Abdelrahim A.
Tourad, Mohamedou Cheikh
Berrada, Ismail
Jarrar, Mustafa
Shehata, Shady
Abdul-Mageed, Muhammad
contents Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.
format Preprint
id arxiv_https___arxiv_org_abs_2505_21979
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Alwajih, Fakhraddin
Magdy, Samar M.
Mekki, Abdellah El
Nacar, Omer
Nafea, Youssef
Abdelfadil, Safaa Taher
Yahya, Abdulfattah Mohammed
Luqman, Hamzah
Almarwani, Nada
Aloufi, Samah
Qawasmen, Baraah
Atou, Houdaifa
Sibaee, Serry
Alsayadi, Hamzah A.
Al-Dhabyani, Walid
Al-shaibani, Maged S.
Aatar, Aya El
Qandos, Nour
Alhamouri, Rahaf
Ahmad, Samar
Al-Ghrawi, Mohammed Anwar
Yacoub, Aminetou
AbuHweidi, Ruwa
Lemin, Vatimetou Mohamed
Abdel-Salam, Reem
Bashiti, Ahlam
Alansari, Aisha
Ashraf, Ahmed
Alturayeif, Nora
Inciarte, Alcides Alcoba
Ammar, Adel
Elmadany, Abdelrahim A.
Tourad, Mohamedou Cheikh
Berrada, Ismail
Jarrar, Mustafa
Shehata, Shady
Abdul-Mageed, Muhammad
Computation and Language
Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.
title Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
topic Computation and Language
url https://arxiv.org/abs/2505.21979