Enregistré dans:
Détails bibliographiques
Auteurs principaux: Zhang, Tuo, Feng, Tiantian, Ni, Yibin, Cao, Mengqin, Liu, Ruying, Butler, Katharine, Weng, Yanjun, Zhang, Mi, Narayanan, Shrikanth S., Avestimehr, Salman
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2406.10318
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866909224586444800
author Zhang, Tuo
Feng, Tiantian
Ni, Yibin
Cao, Mengqin
Liu, Ruying
Butler, Katharine
Weng, Yanjun
Zhang, Mi
Narayanan, Shrikanth S.
Avestimehr, Salman
author_facet Zhang, Tuo
Feng, Tiantian
Ni, Yibin
Cao, Mengqin
Liu, Ruying
Butler, Katharine
Weng, Yanjun
Zhang, Mi
Narayanan, Shrikanth S.
Avestimehr, Salman
contents Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora.
format Preprint
id arxiv_https___arxiv_org_abs_2406_10318
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding
Zhang, Tuo
Feng, Tiantian
Ni, Yibin
Cao, Mengqin
Liu, Ruying
Butler, Katharine
Weng, Yanjun
Zhang, Mi
Narayanan, Shrikanth S.
Avestimehr, Salman
Computer Vision and Pattern Recognition
Artificial Intelligence
Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora.
title Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2406.10318