Saved in:
| Main Authors: | Momayiz, Imane, Elaouad, Soufiane Ait, Elmajjodi, Abdeljalil, Bouanane, Haitame |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.08070 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)
OCR-Quality: A Human-Annotated Dataset for OCR Quality Assessment
by: Zhang, Yulong
Published: (2025)
by: Zhang, Yulong
Published: (2025)
HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025)
by: Hunyuan Vision Team, et al.
Published: (2025)
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval
by: Most, Alexander, et al.
Published: (2025)
by: Most, Alexander, et al.
Published: (2025)
Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)
by: Nigam, Shubham Kumar, et al.
Published: (2025)
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens
by: Yu, Ya-Qi, et al.
Published: (2024)
by: Yu, Ya-Qi, et al.
Published: (2024)
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
by: Liu, Yuliang, et al.
Published: (2024)
by: Liu, Yuliang, et al.
Published: (2024)
Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR
by: Shu, Jing, et al.
Published: (2024)
by: Shu, Jing, et al.
Published: (2024)
Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions
by: Karamolegkou, Antonia, et al.
Published: (2026)
by: Karamolegkou, Antonia, et al.
Published: (2026)
GutenOCR: A Grounded Vision-Language Front-End for Documents
by: Heidenreich, Hunter, et al.
Published: (2026)
by: Heidenreich, Hunter, et al.
Published: (2026)
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language
by: Haq, Ijazul, et al.
Published: (2025)
by: Haq, Ijazul, et al.
Published: (2025)
QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation
by: Wasfy, Ahmed, et al.
Published: (2025)
by: Wasfy, Ahmed, et al.
Published: (2025)
Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis
by: Szankin, Maciej, et al.
Published: (2025)
by: Szankin, Maciej, et al.
Published: (2025)
InstructOCR: Instruction Boosting Scene Text Spotting
by: Duan, Chen, et al.
Published: (2024)
by: Duan, Chen, et al.
Published: (2024)
Automated Invoice Data Extraction: Using LLM and OCR
by: Khanchandani, Khushi, et al.
Published: (2025)
by: Khanchandani, Khushi, et al.
Published: (2025)
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction
by: Rashad, Mohamed
Published: (2024)
by: Rashad, Mohamed
Published: (2024)
Evaluating OCR performance on food packaging labels in South Africa
by: Nagayi, Mayimunah, et al.
Published: (2025)
by: Nagayi, Mayimunah, et al.
Published: (2025)
Confidence-Aware Document OCR Error Detection
by: Hemmer, Arthur, et al.
Published: (2024)
by: Hemmer, Arthur, et al.
Published: (2024)
Ocean-OCR: Towards General OCR Application via a Vision-Language Model
by: Chen, Song, et al.
Published: (2025)
by: Chen, Song, et al.
Published: (2025)
Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types
by: Rabby, AKM Shahariar Azad, et al.
Published: (2024)
by: Rabby, AKM Shahariar Azad, et al.
Published: (2024)
Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems
by: Faraz, Ali, et al.
Published: (2026)
by: Faraz, Ali, et al.
Published: (2026)
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
by: Shen, Zhixuan, et al.
Published: (2024)
by: Shen, Zhixuan, et al.
Published: (2024)
Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition
by: Tang, Haocheng, et al.
Published: (2026)
by: Tang, Haocheng, et al.
Published: (2026)
Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR
by: Zhong, Yufeng, et al.
Published: (2025)
by: Zhong, Yufeng, et al.
Published: (2025)
Error Patterns in Historical OCR: A Comparative Analysis of TrOCR and a Vision-Language Model
by: Vesalainen, Ari, et al.
Published: (2026)
by: Vesalainen, Ari, et al.
Published: (2026)
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
by: Tien, Dong Nguyen, et al.
Published: (2025)
by: Tien, Dong Nguyen, et al.
Published: (2025)
Evaluating Open-Source Vision Language Models for Facial Emotion Recognition against Traditional Deep Learning Models
by: Mulukutla, Vamsi Krishna, et al.
Published: (2025)
by: Mulukutla, Vamsi Krishna, et al.
Published: (2025)
CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation
by: Doris, Anna C., et al.
Published: (2025)
by: Doris, Anna C., et al.
Published: (2025)
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
by: Taghadouini, Said, et al.
Published: (2026)
by: Taghadouini, Said, et al.
Published: (2026)
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
by: Zhang, Yulong, et al.
Published: (2025)
by: Zhang, Yulong, et al.
Published: (2025)
Mero Nagarikta: Advanced Nepali Citizenship Data Extractor with Deep Learning-Powered Text Detection and OCR
by: Dhakal, Sisir, et al.
Published: (2024)
by: Dhakal, Sisir, et al.
Published: (2024)
Detached Skip-Links and $R$-Probe: Decoupling Feature Aggregation from Gradient Propagation for MLLM OCR
by: Yuan, Ziye, et al.
Published: (2026)
by: Yuan, Ziye, et al.
Published: (2026)
MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns
by: Zhang, Jiarui, et al.
Published: (2025)
by: Zhang, Jiarui, et al.
Published: (2025)
OmniOCR: Generalist OCR for Ethnic Minority Languages
by: Liu, Bonan, et al.
Published: (2026)
by: Liu, Bonan, et al.
Published: (2026)
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
by: Hung, Chia-Yu, et al.
Published: (2025)
by: Hung, Chia-Yu, et al.
Published: (2025)
TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision
by: Gillani, Syeda Anshrah, et al.
Published: (2025)
by: Gillani, Syeda Anshrah, et al.
Published: (2025)
DianJin-OCR-R1: Enhancing OCR Capabilities via a Reasoning-and-Tool Interleaved Vision-Language Model
by: Chen, Qian, et al.
Published: (2025)
by: Chen, Qian, et al.
Published: (2025)
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
by: Zhong, Yufeng, et al.
Published: (2026)
by: Zhong, Yufeng, et al.
Published: (2026)
JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers
by: Fujitake, Masato
Published: (2024)
by: Fujitake, Masato
Published: (2024)
Early evidence of how LLMs outperform traditional systems on OCR/HTR tasks for historical records
by: Kim, Seorin, et al.
Published: (2025)
by: Kim, Seorin, et al.
Published: (2025)
Similar Items
-
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026) -
OCR-Quality: A Human-Annotated Dataset for OCR Quality Assessment
by: Zhang, Yulong
Published: (2025) -
HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025) -
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval
by: Most, Alexander, et al.
Published: (2025) -
Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)