Rārangi ihirangi: :: Library Catalog

I tiakina i:

Ngā taipitopito rārangi puna kōrero
Kaituhi matua:	Vlachou Efstathiou, Malamatenia
Hōputu:	Recurso digital
Reo:	Wīwī
I whakaputaina:	Zenodo 2026
Ngā marau:	latin palaeography manuscript ancient french grandes chroniques de france henri de trévou raoulet d'orléans htr
Urunga tuihono:	https://doi.org/10.5281/zenodo.18745702
Ngā Tūtohu:	Tāpirihia he Tūtohu Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!

Rārangi ihirangi:

This repository contains the extended version of the ground truth for the codex <a href="https://gallica.bnf.fr/ark:/12148/btv1b84472995">Paris, BnF, fr. 2813</a>, used in the experiments for the paper “Leveraging Morphology for Metrological Historical Script Analysis”, accepted to <a href="https://icdar2026.org/">International Conference on Document Analysis and Recognition</a> (ICDAR 2026, Vienna, Austria). <h3>What’s New Compared to v.1</h3> <ul> <li> 95 newly annotated folios have been added (see the new btv1b84472995_metadata.csv for details); </li> <li> The ALTO XML annotations now distinguish between #MainZone#1 and #MainZone#2, corresponding to the column order on each page; </li> <li> Two versions of annotation.json are provided: one version includes hyphenation for word breaks at the end of lines. </li> </ul>   As for version 1, the repository is organized into two main data folders: --- <h2> `btv1b84472995_GT.zip`</h2> This folder contains the ground truth dataset used for Handwritten Text Recognition (HTR), created from the selected folia of the manuscript Paris, BnF, français. 2813  The identifier `btv1b84472995` refers to the ark ID of this manuscript in <a href="https://gallica.bnf.fr/ark:/12148/btv1b84472995" rel="noopener">Gallica</a>. Folder structure: btv1b84472995_GT ├── images └── annotations - `images/`: High-resolution selected images downloaded from Gallica.     Image names follow the pattern `btv1b84472995_f<number>`, corresponding to the Gallica view number.     ➤ Credit: *Source gallica.bnf.fr / Bibliothèque nationale de France* - `annotations/`: XML-ALTO annotation files created with <a href="https://escriptorium.inria.fr/">eScriptorium</a>. Layout: Annotations follow the <a href="https://segmonto.github.io/">Segmonto</a> ontology.  The potential users of the ground truth should note that we use additional personalized tags for:   - `'RubricLines'`: Rubricated lines   - `'HalfLines'`: Partial or incomplete lines   - `'MainZone#1'` and  `'MainZone#2'`: order of the column, instead of simply #MainZone  Transcription: The dataset is <a href="https://zenodo.org/records/12743230">CATMuS</a>-compliant, using a graphemic transcription approach. --- <h2>  `dataset.zip`</h2> This folder contains the dataset used in the experiments described in the paper, using the DTLR architecture for paleography, as detailed in the paper. Folder structure: dataset ├── images └── annotation.json - `images/`: Each subfolder contains polygonal line extractions (with alpha transparency) per manuscript page. - `annotation.json`: Contains the annotation and metadata for each line. `annotation.json` structure example: ```json "<image_id>": {                      // corresponds to the image names in the images folders   "split": "train",             "label": "A beautiful calico cat.",// Transcription text of the line     "line": "DefaultLine", // Type of line     "zone": "MainZone#1", // Type of Zone where the line is found  "script": "RaouletOrleans",       // Identifier for the scribal hand   "folio": "1r",                       "gp": "GP1",     // Identified Graphic Profile                   "doc": "HT1",                       } Papers associated with the data: v1: https://malamatenia.github.io/bnf-fr-2813/ (Scriptorium 2026) v2: https://malamatenia.github.io/dtlr-for-metrology/  (ICDAR 2026) This study was supported by the CNRS through MITI and the 80|Prime program (CrEMe Caractérisation des écritures médiévales), and by the European Research Council (ERC project DISCOVER, number 101076028).

Ngā tūemi rite