Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Jun Seong, Thu, Kyaw Ye, Ismayilzada, Javad, Park, Junyeong, Kim, Eunsu, Ahmad, Huzama, An, Na Min, Thorne, James, Oh, Alice
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.16826
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916658902204416
author	Kim, Jun Seong Thu, Kyaw Ye Ismayilzada, Javad Park, Junyeong Kim, Eunsu Ahmad, Huzama An, Na Min Thorne, James Oh, Alice
author_facet	Kim, Jun Seong Thu, Kyaw Ye Ismayilzada, Javad Park, Junyeong Kim, Eunsu Ahmad, Huzama An, Na Min Thorne, James Oh, Alice
contents	In a highly globalized world, it is important for multi-modal large language models (MLLMs) to recognize and respond correctly to mixed-cultural inputs. For example, a model should correctly identify kimchi (Korean food) in an image both when an Asian woman is eating it, as well as an African man is eating it. However, current MLLMs show an over-reliance on the visual features of the person, leading to misclassification of the entities. To examine the robustness of MLLMs to different ethnicity, we introduce MixCuBe, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities. Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbation for high-resource cultures, but not for low-resource cultures. GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures. Our dataset is publicly available at: https://huggingface.co/datasets/kyawyethu/MixCuBe.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_16826
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts Kim, Jun Seong Thu, Kyaw Ye Ismayilzada, Javad Park, Junyeong Kim, Eunsu Ahmad, Huzama An, Na Min Thorne, James Oh, Alice Computation and Language In a highly globalized world, it is important for multi-modal large language models (MLLMs) to recognize and respond correctly to mixed-cultural inputs. For example, a model should correctly identify kimchi (Korean food) in an image both when an Asian woman is eating it, as well as an African man is eating it. However, current MLLMs show an over-reliance on the visual features of the person, leading to misclassification of the entities. To examine the robustness of MLLMs to different ethnicity, we introduce MixCuBe, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities. Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbation for high-resource cultures, but not for low-resource cultures. GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures. Our dataset is publicly available at: https://huggingface.co/datasets/kyawyethu/MixCuBe.
title	When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts
topic	Computation and Language
url	https://arxiv.org/abs/2503.16826

Similar Items