Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Yin, Zhang, Gangjian, Chen, Jiayu, Xu, Chang, Luo, Yuyu, Tang, Nan, Xiong, Hui
Format:	Preprint
Published:	2026
Subjects:	Information Retrieval Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.09668
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914464618512384
author	Wu, Yin Zhang, Gangjian Chen, Jiayu Xu, Chang Luo, Yuyu Tang, Nan Xiong, Hui
author_facet	Wu, Yin Zhang, Gangjian Chen, Jiayu Xu, Chang Luo, Yuyu Tang, Nan Xiong, Hui
contents	Understanding humanity's earliest writing systems is crucial for reconstructing civilization's origins, yet many ancient scripts remain undeciphered. Oracle Bone Script (OBS) from China's Shang dynasty exemplifies this challenge: only approximately 1,500 of roughly 4,600 characters have been decoded, and a substantial portion of these 3,000-year-old inscriptions remains only partially understood. Limited by extreme data scarcity, existing computational methods achieve under 3% accuracy on unseen characters -- the core palaeographic challenge. We overcome this by reframing decipherment from classification to dictionary-based retrieval. Using deep learning guided by character evolution principles, we generate a comprehensive synthetic dictionary of plausible OBS variants for modern Chinese characters. Scholars query unknown inscriptions to retrieve visually similar candidates with transparent evidence, replacing algorithmic black boxes with interpretable hypotheses. Our approach achieves 54.3% Top-10 and 86.6% Top-50 accuracy for unseen characters. This scalable, transparent framework accelerates decipherment of a pivotal undeciphered script and establishes a generalizable methodology for AI-assisted archaeological discovery.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_09668
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Decoding Ancient Oracle Bone Script via Generative Dictionary Retrieval Wu, Yin Zhang, Gangjian Chen, Jiayu Xu, Chang Luo, Yuyu Tang, Nan Xiong, Hui Information Retrieval Computer Vision and Pattern Recognition Understanding humanity's earliest writing systems is crucial for reconstructing civilization's origins, yet many ancient scripts remain undeciphered. Oracle Bone Script (OBS) from China's Shang dynasty exemplifies this challenge: only approximately 1,500 of roughly 4,600 characters have been decoded, and a substantial portion of these 3,000-year-old inscriptions remains only partially understood. Limited by extreme data scarcity, existing computational methods achieve under 3% accuracy on unseen characters -- the core palaeographic challenge. We overcome this by reframing decipherment from classification to dictionary-based retrieval. Using deep learning guided by character evolution principles, we generate a comprehensive synthetic dictionary of plausible OBS variants for modern Chinese characters. Scholars query unknown inscriptions to retrieve visually similar candidates with transparent evidence, replacing algorithmic black boxes with interpretable hypotheses. Our approach achieves 54.3% Top-10 and 86.6% Top-50 accuracy for unseen characters. This scalable, transparent framework accelerates decipherment of a pivotal undeciphered script and establishes a generalizable methodology for AI-assisted archaeological discovery.
title	Decoding Ancient Oracle Bone Script via Generative Dictionary Retrieval
topic	Information Retrieval Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.09668

Similar Items