Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Alwajih, Fakhraddin, Magdy, Samar M., Mekki, Abdellah El, Nacar, Omer, Nafea, Youssef, Abdelfadil, Safaa Taher, Yahya, Abdulfattah Mohammed, Luqman, Hamzah, Almarwani, Nada, Aloufi, Samah, Qawasmen, Baraah, Atou, Houdaifa, Sibaee, Serry, Alsayadi, Hamzah A., Al-Dhabyani, Walid, Al-shaibani, Maged S., Aatar, Aya El, Qandos, Nour, Alhamouri, Rahaf, Ahmad, Samar, Al-Ghrawi, Mohammed Anwar, Yacoub, Aminetou, AbuHweidi, Ruwa, Lemin, Vatimetou Mohamed, Abdel-Salam, Reem, Bashiti, Ahlam, Alansari, Aisha, Ashraf, Ahmed, Alturayeif, Nora, Inciarte, Alcides Alcoba, Ammar, Adel, Elmadany, Abdelrahim A., Tourad, Mohamedou Cheikh, Berrada, Ismail, Jarrar, Mustafa, Shehata, Shady, Abdul-Mageed, Muhammad
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.21979
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908561884315648
author	Alwajih, Fakhraddin Magdy, Samar M. Mekki, Abdellah El Nacar, Omer Nafea, Youssef Abdelfadil, Safaa Taher Yahya, Abdulfattah Mohammed Luqman, Hamzah Almarwani, Nada Aloufi, Samah Qawasmen, Baraah Atou, Houdaifa Sibaee, Serry Alsayadi, Hamzah A. Al-Dhabyani, Walid Al-shaibani, Maged S. Aatar, Aya El Qandos, Nour Alhamouri, Rahaf Ahmad, Samar Al-Ghrawi, Mohammed Anwar Yacoub, Aminetou AbuHweidi, Ruwa Lemin, Vatimetou Mohamed Abdel-Salam, Reem Bashiti, Ahlam Alansari, Aisha Ashraf, Ahmed Alturayeif, Nora Inciarte, Alcides Alcoba Ammar, Adel Elmadany, Abdelrahim A. Tourad, Mohamedou Cheikh Berrada, Ismail Jarrar, Mustafa Shehata, Shady Abdul-Mageed, Muhammad
author_facet	Alwajih, Fakhraddin Magdy, Samar M. Mekki, Abdellah El Nacar, Omer Nafea, Youssef Abdelfadil, Safaa Taher Yahya, Abdulfattah Mohammed Luqman, Hamzah Almarwani, Nada Aloufi, Samah Qawasmen, Baraah Atou, Houdaifa Sibaee, Serry Alsayadi, Hamzah A. Al-Dhabyani, Walid Al-shaibani, Maged S. Aatar, Aya El Qandos, Nour Alhamouri, Rahaf Ahmad, Samar Al-Ghrawi, Mohammed Anwar Yacoub, Aminetou AbuHweidi, Ruwa Lemin, Vatimetou Mohamed Abdel-Salam, Reem Bashiti, Ahlam Alansari, Aisha Ashraf, Ahmed Alturayeif, Nora Inciarte, Alcides Alcoba Ammar, Adel Elmadany, Abdelrahim A. Tourad, Mohamedou Cheikh Berrada, Ismail Jarrar, Mustafa Shehata, Shady Abdul-Mageed, Muhammad
contents	Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_21979
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Alwajih, Fakhraddin Magdy, Samar M. Mekki, Abdellah El Nacar, Omer Nafea, Youssef Abdelfadil, Safaa Taher Yahya, Abdulfattah Mohammed Luqman, Hamzah Almarwani, Nada Aloufi, Samah Qawasmen, Baraah Atou, Houdaifa Sibaee, Serry Alsayadi, Hamzah A. Al-Dhabyani, Walid Al-shaibani, Maged S. Aatar, Aya El Qandos, Nour Alhamouri, Rahaf Ahmad, Samar Al-Ghrawi, Mohammed Anwar Yacoub, Aminetou AbuHweidi, Ruwa Lemin, Vatimetou Mohamed Abdel-Salam, Reem Bashiti, Ahlam Alansari, Aisha Ashraf, Ahmed Alturayeif, Nora Inciarte, Alcides Alcoba Ammar, Adel Elmadany, Abdelrahim A. Tourad, Mohamedou Cheikh Berrada, Ismail Jarrar, Mustafa Shehata, Shady Abdul-Mageed, Muhammad Computation and Language Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.
title	Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
topic	Computation and Language
url	https://arxiv.org/abs/2505.21979

Similar Items