Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.21979 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908561884315648 |
|---|---|
| author | Alwajih, Fakhraddin Magdy, Samar M. Mekki, Abdellah El Nacar, Omer Nafea, Youssef Abdelfadil, Safaa Taher Yahya, Abdulfattah Mohammed Luqman, Hamzah Almarwani, Nada Aloufi, Samah Qawasmen, Baraah Atou, Houdaifa Sibaee, Serry Alsayadi, Hamzah A. Al-Dhabyani, Walid Al-shaibani, Maged S. Aatar, Aya El Qandos, Nour Alhamouri, Rahaf Ahmad, Samar Al-Ghrawi, Mohammed Anwar Yacoub, Aminetou AbuHweidi, Ruwa Lemin, Vatimetou Mohamed Abdel-Salam, Reem Bashiti, Ahlam Alansari, Aisha Ashraf, Ahmed Alturayeif, Nora Inciarte, Alcides Alcoba Ammar, Adel Elmadany, Abdelrahim A. Tourad, Mohamedou Cheikh Berrada, Ismail Jarrar, Mustafa Shehata, Shady Abdul-Mageed, Muhammad |
| author_facet | Alwajih, Fakhraddin Magdy, Samar M. Mekki, Abdellah El Nacar, Omer Nafea, Youssef Abdelfadil, Safaa Taher Yahya, Abdulfattah Mohammed Luqman, Hamzah Almarwani, Nada Aloufi, Samah Qawasmen, Baraah Atou, Houdaifa Sibaee, Serry Alsayadi, Hamzah A. Al-Dhabyani, Walid Al-shaibani, Maged S. Aatar, Aya El Qandos, Nour Alhamouri, Rahaf Ahmad, Samar Al-Ghrawi, Mohammed Anwar Yacoub, Aminetou AbuHweidi, Ruwa Lemin, Vatimetou Mohamed Abdel-Salam, Reem Bashiti, Ahlam Alansari, Aisha Ashraf, Ahmed Alturayeif, Nora Inciarte, Alcides Alcoba Ammar, Adel Elmadany, Abdelrahim A. Tourad, Mohamedou Cheikh Berrada, Ismail Jarrar, Mustafa Shehata, Shady Abdul-Mageed, Muhammad |
| contents | Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_21979 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Alwajih, Fakhraddin Magdy, Samar M. Mekki, Abdellah El Nacar, Omer Nafea, Youssef Abdelfadil, Safaa Taher Yahya, Abdulfattah Mohammed Luqman, Hamzah Almarwani, Nada Aloufi, Samah Qawasmen, Baraah Atou, Houdaifa Sibaee, Serry Alsayadi, Hamzah A. Al-Dhabyani, Walid Al-shaibani, Maged S. Aatar, Aya El Qandos, Nour Alhamouri, Rahaf Ahmad, Samar Al-Ghrawi, Mohammed Anwar Yacoub, Aminetou AbuHweidi, Ruwa Lemin, Vatimetou Mohamed Abdel-Salam, Reem Bashiti, Ahlam Alansari, Aisha Ashraf, Ahmed Alturayeif, Nora Inciarte, Alcides Alcoba Ammar, Adel Elmadany, Abdelrahim A. Tourad, Mohamedou Cheikh Berrada, Ismail Jarrar, Mustafa Shehata, Shady Abdul-Mageed, Muhammad Computation and Language Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available. |
| title | Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2505.21979 |