Uloženo v:
Podrobná bibliografie
Hlavní autor: Rathnayaka, Maheesha Sewmini
Médium: Recurso digital
Jazyk:angličtina
Vydáno: Zenodo 2026
Témata:
On-line přístup:https://doi.org/10.5281/zenodo.19878034
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Obsah:
  • <p>This dataset contains a collection of images representing Sinhala handwritten characters, words, and paragraphs, covering all characters and connectors. It includes 25,516 scanned images gathered from 381 participants, representing diverse handwriting styles across different demographic groups.</p> <p>This dataset was developed as part of a research project on Sinhala handwritten text recognition.</p> <p>The dataset is annotated with labels at the character, word, and paragraph levels, and includes linked metadata (age, gender, education/occupation category, residence type, and dominant hand) using anonymized participant IDs. Demographic information is recorded separately in an Excel file (<code>MetaData.xlsx</code>) to ensure anonymity.</p> <p>Images were scanned at 300 DPI and split into training, validation, and test sets. Data augmentation was applied only to the training dataset to improve model generalization. The <code>Raw</code> folder contains the original training, validation, and test datasets, while the <code>Augmented</code> folder contains the augmented version of the training dataset.</p> <p>A <code>README.md</code> file is included, containing detailed documentation of the dataset structure,  data augmentation procedures, and usage instructions.</p> <p>This dataset is intended to develop and evaluate machine learning and deep learning models for Sinhala handwritten text recognition, including applications such as optical character recognition (OCR), image classification, and sequence modeling.</p> <p>For additional details, updates, and related resources, please refer to the GitHub repository: <a href="https://github.com/MaheeshaSewmini/Sinhala-OCR-Dataset">https://github.com/MaheeshaSewmini/Sinhala-OCR-Dataset</a></p>