Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Zhang, Tuo, Feng, Tiantian, Ni, Yibin, Cao, Mengqin, Liu, Ruying, Butler, Katharine, Weng, Yanjun, Zhang, Mi, Narayanan, Shrikanth S., Avestimehr, Salman
Format:	Preprint
Publié:	2024
Sujets:	Computer Vision and Pattern Recognition Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2406.10318
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866909224586444800
author	Zhang, Tuo Feng, Tiantian Ni, Yibin Cao, Mengqin Liu, Ruying Butler, Katharine Weng, Yanjun Zhang, Mi Narayanan, Shrikanth S. Avestimehr, Salman
author_facet	Zhang, Tuo Feng, Tiantian Ni, Yibin Cao, Mengqin Liu, Ruying Butler, Katharine Weng, Yanjun Zhang, Mi Narayanan, Shrikanth S. Avestimehr, Salman
contents	Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_10318
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding Zhang, Tuo Feng, Tiantian Ni, Yibin Cao, Mengqin Liu, Ruying Butler, Katharine Weng, Yanjun Zhang, Mi Narayanan, Shrikanth S. Avestimehr, Salman Computer Vision and Pattern Recognition Artificial Intelligence Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora.
title	Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2406.10318

Documents similaires