Saved in:
Bibliographic Details
Main Authors: Bayramli, Zahra, Suleymanzade, Ayhan, An, Na Min, Ahmad, Huzama, Kim, Eunsu, Park, Junyeong, Thorne, James, Oh, Alice
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.08914
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912448297041920
author Bayramli, Zahra
Suleymanzade, Ayhan
An, Na Min
Ahmad, Huzama
Kim, Eunsu
Park, Junyeong
Thorne, James
Oh, Alice
author_facet Bayramli, Zahra
Suleymanzade, Ayhan
An, Na Min
Ahmad, Huzama
Kim, Eunsu
Park, Junyeong
Thorne, James
Oh, Alice
contents Text-to-image diffusion models have recently enabled the creation of visually compelling, detailed images from textual prompts. However, their ability to accurately represent various cultural nuances remains an open question. In our work, we introduce CultDiff benchmark, evaluating state-of-the-art diffusion models whether they can generate culturally specific images spanning ten countries. We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions, by conducting a fine-grained analysis of different similarity aspects, revealing significant disparities in cultural relevance, description fidelity, and realism compared to real-world reference images. With the collected human evaluations, we develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts. Our work highlights the need for more inclusive generative AI systems and equitable dataset representation over a wide range of cultures.
format Preprint
id arxiv_https___arxiv_org_abs_2502_08914
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Bayramli, Zahra
Suleymanzade, Ayhan
An, Na Min
Ahmad, Huzama
Kim, Eunsu
Park, Junyeong
Thorne, James
Oh, Alice
Computer Vision and Pattern Recognition
Artificial Intelligence
Text-to-image diffusion models have recently enabled the creation of visually compelling, detailed images from textual prompts. However, their ability to accurately represent various cultural nuances remains an open question. In our work, we introduce CultDiff benchmark, evaluating state-of-the-art diffusion models whether they can generate culturally specific images spanning ten countries. We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions, by conducting a fine-grained analysis of different similarity aspects, revealing significant disparities in cultural relevance, description fidelity, and realism compared to real-world reference images. With the collected human evaluations, we develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts. Our work highlights the need for more inclusive generative AI systems and equitable dataset representation over a wide range of cultures.
title Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2502.08914