Saved in:
| Main Authors: | Kiessling, Benjamin, Kurin, Gennady, Miller, Matthew Thomas, Smail, Kader |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.10943 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
by: Gagnier, Henry, et al.
Published: (2026)
by: Gagnier, Henry, et al.
Published: (2026)
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
by: Kargaran, Amir Hossein, et al.
Published: (2026)
by: Kargaran, Amir Hossein, et al.
Published: (2026)
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
by: Hennara, Khalil, et al.
Published: (2025)
by: Hennara, Khalil, et al.
Published: (2025)
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction
by: Rashad, Mohamed
Published: (2024)
by: Rashad, Mohamed
Published: (2024)
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)
Cross-Lingual SynthDocs: A Large-Scale Synthetic Corpus for Any to Arabic OCR and Document Understanding
by: Al-Homoud, Haneen, et al.
Published: (2025)
by: Al-Homoud, Haneen, et al.
Published: (2025)
PubMed-OCR: PMC Open Access OCR Annotations
by: Heidenreich, Hunter, et al.
Published: (2026)
by: Heidenreich, Hunter, et al.
Published: (2026)
olmOCR 2: Unit Test Rewards for Document OCR
by: Poznanski, Jake, et al.
Published: (2025)
by: Poznanski, Jake, et al.
Published: (2025)
PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy
by: Guan, Shuhao, et al.
Published: (2025)
by: Guan, Shuhao, et al.
Published: (2025)
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
by: Heakl, Ahmed, et al.
Published: (2025)
by: Heakl, Ahmed, et al.
Published: (2025)
RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages
by: Kashid, Harshvivek, et al.
Published: (2024)
by: Kashid, Harshvivek, et al.
Published: (2024)
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
by: Luo, Run, et al.
Published: (2025)
by: Luo, Run, et al.
Published: (2025)
Towards Deployable OCR models for Indic languages
by: Mathew, Minesh, et al.
Published: (2022)
by: Mathew, Minesh, et al.
Published: (2022)
Improving OCR for Historical Texts of Multiple Languages
by: Westerdijk, Hylke, et al.
Published: (2025)
by: Westerdijk, Hylke, et al.
Published: (2025)
OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models
by: Liu, Yuliang, et al.
Published: (2023)
by: Liu, Yuliang, et al.
Published: (2023)
Seeing Straight: Document Orientation Detection for Efficient OCR
by: Goswami, Suranjan, et al.
Published: (2025)
by: Goswami, Suranjan, et al.
Published: (2025)
ReceiptSense: Beyond Traditional OCR -- A Dataset for Receipt Understanding
by: Abdallah, Abdelrahman, et al.
Published: (2024)
by: Abdallah, Abdelrahman, et al.
Published: (2024)
Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
by: Liang, Yunhao, et al.
Published: (2026)
by: Liang, Yunhao, et al.
Published: (2026)
CML-Bench: A Framework for Evaluating and Enhancing LLM-Powered Movie Scripts Generation
by: Zheng, Mingzhe, et al.
Published: (2025)
by: Zheng, Mingzhe, et al.
Published: (2025)
Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts
by: Chen, Yingfa, et al.
Published: (2024)
by: Chen, Yingfa, et al.
Published: (2024)
Artwork Interpretation with Vision Language Models: A Case Study on Emotions and Emotion Symbols
by: Padó, Sebastian, et al.
Published: (2025)
by: Padó, Sebastian, et al.
Published: (2025)
Open-Source Image Editing Models Are Zero-Shot Vision Learners
by: Liu, Wei, et al.
Published: (2026)
by: Liu, Wei, et al.
Published: (2026)
AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation
by: Wang, Zhengren, et al.
Published: (2026)
by: Wang, Zhengren, et al.
Published: (2026)
AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks
by: Eltahir, Mohamed, et al.
Published: (2025)
by: Eltahir, Mohamed, et al.
Published: (2025)
Confidence-Aware Document OCR Error Detection
by: Hemmer, Arthur, et al.
Published: (2024)
by: Hemmer, Arthur, et al.
Published: (2024)
Reference-Based Post-OCR Processing with LLM for Precise Diacritic Text in Historical Document Recognition
by: Do, Thao, et al.
Published: (2024)
by: Do, Thao, et al.
Published: (2024)
FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing
by: Tang, Zihan, et al.
Published: (2026)
by: Tang, Zihan, et al.
Published: (2026)
Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
by: Luo, Fuwen, et al.
Published: (2026)
by: Luo, Fuwen, et al.
Published: (2026)
Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library
by: Tarride, Solène, et al.
Published: (2024)
by: Tarride, Solène, et al.
Published: (2024)
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
by: Ye, Maoyuan, et al.
Published: (2025)
by: Ye, Maoyuan, et al.
Published: (2025)
Multimedia Generative Script Learning for Task Planning
by: Wang, Qingyun, et al.
Published: (2022)
by: Wang, Qingyun, et al.
Published: (2022)
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
by: Le, Binh M., et al.
Published: (2025)
by: Le, Binh M., et al.
Published: (2025)
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
by: Momayiz, Imane, et al.
Published: (2026)
by: Momayiz, Imane, et al.
Published: (2026)
Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
by: Zhang, Jianing, et al.
Published: (2026)
by: Zhang, Jianing, et al.
Published: (2026)
Cross-Language Learning within Arabic Script for Low-Resource HTR
by: Al-azzawi, Sana, et al.
Published: (2026)
by: Al-azzawi, Sana, et al.
Published: (2026)
Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function
by: Aslam, Muhammad Azeem, et al.
Published: (2024)
by: Aslam, Muhammad Azeem, et al.
Published: (2024)
Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR
by: Zhong, Yufeng, et al.
Published: (2025)
by: Zhong, Yufeng, et al.
Published: (2025)
Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset
by: Majeed, Ameer, et al.
Published: (2024)
by: Majeed, Ameer, et al.
Published: (2024)
SyriSign: A Parallel Corpus for Arabic Text to Syrian Arabic Sign Language Translation
by: Khalil, Mohammad Amer, et al.
Published: (2026)
by: Khalil, Mohammad Amer, et al.
Published: (2026)
Bridging the Gap: Fusing CNNs and Transformers to Decode the Elegance of Handwritten Arabic Script
by: Boufenar, Chaouki, et al.
Published: (2025)
by: Boufenar, Chaouki, et al.
Published: (2025)
Similar Items
-
KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
by: Gagnier, Henry, et al.
Published: (2026) -
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
by: Kargaran, Amir Hossein, et al.
Published: (2026) -
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
by: Hennara, Khalil, et al.
Published: (2025) -
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction
by: Rashad, Mohamed
Published: (2024) -
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)