Enregistré dans:
| Auteurs principaux: | , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2024
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2406.10318 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866909224586444800 |
|---|---|
| author | Zhang, Tuo Feng, Tiantian Ni, Yibin Cao, Mengqin Liu, Ruying Butler, Katharine Weng, Yanjun Zhang, Mi Narayanan, Shrikanth S. Avestimehr, Salman |
| author_facet | Zhang, Tuo Feng, Tiantian Ni, Yibin Cao, Mengqin Liu, Ruying Butler, Katharine Weng, Yanjun Zhang, Mi Narayanan, Shrikanth S. Avestimehr, Salman |
| contents | Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2406_10318 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding Zhang, Tuo Feng, Tiantian Ni, Yibin Cao, Mengqin Liu, Ruying Butler, Katharine Weng, Yanjun Zhang, Mi Narayanan, Shrikanth S. Avestimehr, Salman Computer Vision and Pattern Recognition Artificial Intelligence Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora. |
| title | Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding |
| topic | Computer Vision and Pattern Recognition Artificial Intelligence |
| url | https://arxiv.org/abs/2406.10318 |