Saved in:
Bibliographic Details
Main Authors: Parsons, Stephen, Parker, C. Seth, Chapman, Christy, Hayashida, Mami, Seales, W. Brent
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2304.02084
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909207242997760
author Parsons, Stephen
Parker, C. Seth
Chapman, Christy
Hayashida, Mami
Seales, W. Brent
author_facet Parsons, Stephen
Parker, C. Seth
Chapman, Christy
Hayashida, Mami
Seales, W. Brent
contents We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.
format Preprint
id arxiv_https___arxiv_org_abs_2304_02084
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT
Parsons, Stephen
Parker, C. Seth
Chapman, Christy
Hayashida, Mami
Seales, W. Brent
Computer Vision and Pattern Recognition
Machine Learning
We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.
title EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2304.02084