Saved in:
Bibliographic Details
Main Authors: Ranjan, Mukul, Jha, Prince, Kumari, Khushboo, Shen, Zhiqiang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.15071
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914567628521472
author Ranjan, Mukul
Jha, Prince
Kumari, Khushboo
Shen, Zhiqiang
author_facet Ranjan, Mukul
Jha, Prince
Kumari, Khushboo
Shen, Zhiqiang
contents Vision-Language Models (VLMs) are increasingly applied to cultural heritage materials, from digital archives to educational platforms. This work identifies a fundamental issue in how these models interpret historical artifacts. We define this phenomenon as cultural anachronism, the tendency to misinterpret historical objects using temporally inappropriate concepts, materials, or cultural frameworks. To quantify this phenomenon, we introduce the Temporal Anachronism Benchmark for Vision-Language Models (TAB-VLM), a dataset of 600 questions across six categories, designed to evaluate temporal reasoning on 1,600 Indian cultural artifacts spanning prehistoric to modern periods. Systematic evaluations of ten state-of-the-art models reveal significant deficiencies on our benchmark, and even the best model (GPT-5.2) achieves only 58.7% overall accuracy. The performance gap persists across varying architectures and scales, suggesting that cultural anachronism represents a significant limitation in visual AI systems, regardless of model size. These findings highlight the disparity between current VLM capabilities and the requirements for accurately interpreting cultural heritage materials, particularly for non-Western visual cultures underrepresented in training data. Our benchmark provides a foundation for enhancing temporal cognition in multimodal AI systems that interact with historical artifacts. The dataset and code are available in our project page.
format Preprint
id arxiv_https___arxiv_org_abs_2605_15071
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle On the Cultural Anachronism and Temporal Reasoning in Vision Language Models
Ranjan, Mukul
Jha, Prince
Kumari, Khushboo
Shen, Zhiqiang
Computer Vision and Pattern Recognition
Artificial Intelligence
Computation and Language
Vision-Language Models (VLMs) are increasingly applied to cultural heritage materials, from digital archives to educational platforms. This work identifies a fundamental issue in how these models interpret historical artifacts. We define this phenomenon as cultural anachronism, the tendency to misinterpret historical objects using temporally inappropriate concepts, materials, or cultural frameworks. To quantify this phenomenon, we introduce the Temporal Anachronism Benchmark for Vision-Language Models (TAB-VLM), a dataset of 600 questions across six categories, designed to evaluate temporal reasoning on 1,600 Indian cultural artifacts spanning prehistoric to modern periods. Systematic evaluations of ten state-of-the-art models reveal significant deficiencies on our benchmark, and even the best model (GPT-5.2) achieves only 58.7% overall accuracy. The performance gap persists across varying architectures and scales, suggesting that cultural anachronism represents a significant limitation in visual AI systems, regardless of model size. These findings highlight the disparity between current VLM capabilities and the requirements for accurately interpreting cultural heritage materials, particularly for non-Western visual cultures underrepresented in training data. Our benchmark provides a foundation for enhancing temporal cognition in multimodal AI systems that interact with historical artifacts. The dataset and code are available in our project page.
title On the Cultural Anachronism and Temporal Reasoning in Vision Language Models
topic Computer Vision and Pattern Recognition
Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2605.15071