:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kiessling, Benjamin, Kurin, Gennady, Miller, Matthew Thomas, Smail, Kader
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2402.10943
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
by: Gagnier, Henry, et al.
Published: (2026)

GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
by: Kargaran, Amir Hossein, et al.
Published: (2026)

Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
by: Hennara, Khalil, et al.
Published: (2025)

Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction
by: Rashad, Mohamed
Published: (2024)

DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)

Cross-Lingual SynthDocs: A Large-Scale Synthetic Corpus for Any to Arabic OCR and Document Understanding
by: Al-Homoud, Haneen, et al.
Published: (2025)

PubMed-OCR: PMC Open Access OCR Annotations
by: Heidenreich, Hunter, et al.
Published: (2026)

olmOCR 2: Unit Test Rewards for Document OCR
by: Poznanski, Jake, et al.
Published: (2025)

PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy
by: Guan, Shuhao, et al.
Published: (2025)

KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
by: Heakl, Ahmed, et al.
Published: (2025)

RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages
by: Kashid, Harshvivek, et al.
Published: (2024)

OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
by: Luo, Run, et al.
Published: (2025)

Towards Deployable OCR models for Indic languages
by: Mathew, Minesh, et al.
Published: (2022)

Improving OCR for Historical Texts of Multiple Languages
by: Westerdijk, Hylke, et al.
Published: (2025)

OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models
by: Liu, Yuliang, et al.
Published: (2023)

Seeing Straight: Document Orientation Detection for Efficient OCR
by: Goswami, Suranjan, et al.
Published: (2025)

ReceiptSense: Beyond Traditional OCR -- A Dataset for Receipt Understanding
by: Abdallah, Abdelrahman, et al.
Published: (2024)

Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
by: Liang, Yunhao, et al.
Published: (2026)

CML-Bench: A Framework for Evaluating and Enhancing LLM-Powered Movie Scripts Generation
by: Zheng, Mingzhe, et al.
Published: (2025)

Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts
by: Chen, Yingfa, et al.
Published: (2024)

Artwork Interpretation with Vision Language Models: A Case Study on Emotions and Emotion Symbols
by: Padó, Sebastian, et al.
Published: (2025)

Open-Source Image Editing Models Are Zero-Shot Vision Learners
by: Liu, Wei, et al.
Published: (2026)

AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation
by: Wang, Zhengren, et al.
Published: (2026)

AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks
by: Eltahir, Mohamed, et al.
Published: (2025)

Confidence-Aware Document OCR Error Detection
by: Hemmer, Arthur, et al.
Published: (2024)

Reference-Based Post-OCR Processing with LLM for Precise Diacritic Text in Historical Document Recognition
by: Do, Thao, et al.
Published: (2024)

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing
by: Tang, Zihan, et al.
Published: (2026)

Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
by: Luo, Fuwen, et al.
Published: (2026)

Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library
by: Tarride, Solène, et al.
Published: (2024)

LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
by: Ye, Maoyuan, et al.
Published: (2025)

Multimedia Generative Script Learning for Task Planning
by: Wang, Qingyun, et al.
Published: (2022)

QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
by: Le, Binh M., et al.
Published: (2025)

AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
by: Momayiz, Imane, et al.
Published: (2026)

Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
by: Zhang, Jianing, et al.
Published: (2026)

Cross-Language Learning within Arabic Script for Low-Resource HTR
by: Al-azzawi, Sana, et al.
Published: (2026)

Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function
by: Aslam, Muhammad Azeem, et al.
Published: (2024)

Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR
by: Zhong, Yufeng, et al.
Published: (2025)

Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset
by: Majeed, Ameer, et al.
Published: (2024)

SyriSign: A Parallel Corpus for Arabic Text to Syrian Arabic Sign Language Translation
by: Khalil, Mohammad Amer, et al.
Published: (2026)

Bridging the Gap: Fusing CNNs and Transformers to Decode the Elegance of Handwritten Arabic Script
by: Boufenar, Chaouki, et al.
Published: (2025)