Saved in:
| Main Authors: | Szankin, Maciej, Venkatasamy, Vidhyananth, Ying, Lihang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.11730 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
by: Momayiz, Imane, et al.
Published: (2026)
by: Momayiz, Imane, et al.
Published: (2026)
Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)
by: Nigam, Shubham Kumar, et al.
Published: (2025)
OCR-Quality: A Human-Annotated Dataset for OCR Quality Assessment
by: Zhang, Yulong
Published: (2025)
by: Zhang, Yulong
Published: (2025)
HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025)
by: Hunyuan Vision Team, et al.
Published: (2025)
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)
From Sora What We Can See: A Survey of Text-to-Video Generation
by: Sun, Rui, et al.
Published: (2024)
by: Sun, Rui, et al.
Published: (2024)
CFIS-YOLO: A Lightweight Multi-Scale Fusion Network for Edge-Deployable Wood Defect Detection
by: Kang, Jincheng, et al.
Published: (2025)
by: Kang, Jincheng, et al.
Published: (2025)
Vision-Language Models for Edge Networks: A Comprehensive Survey
by: Sharshar, Ahmed, et al.
Published: (2025)
by: Sharshar, Ahmed, et al.
Published: (2025)
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
by: Liu, Yuliang, et al.
Published: (2024)
by: Liu, Yuliang, et al.
Published: (2024)
On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
by: Huang, Lianming, et al.
Published: (2025)
by: Huang, Lianming, et al.
Published: (2025)
Towards Efficient Image Deblurring for Edge Deployment
by: Miriyala, Srinivas, et al.
Published: (2026)
by: Miriyala, Srinivas, et al.
Published: (2026)
Large Sign Language Models: Toward 3D American Sign Language Translation
by: Zhang, Sen, et al.
Published: (2025)
by: Zhang, Sen, et al.
Published: (2025)
Automated Invoice Data Extraction: Using LLM and OCR
by: Khanchandani, Khushi, et al.
Published: (2025)
by: Khanchandani, Khushi, et al.
Published: (2025)
InstructOCR: Instruction Boosting Scene Text Spotting
by: Duan, Chen, et al.
Published: (2024)
by: Duan, Chen, et al.
Published: (2024)
Generation and Detection of Sign Language Deepfakes - A Linguistic and Visual Analysis
by: Naeem, Shahzeb, et al.
Published: (2024)
by: Naeem, Shahzeb, et al.
Published: (2024)
From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety
by: Sethupathy, Ganen, et al.
Published: (2026)
by: Sethupathy, Ganen, et al.
Published: (2026)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment
by: Wu, Jiaqi, et al.
Published: (2024)
by: Wu, Jiaqi, et al.
Published: (2024)
Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics
by: Yang, Haiyu, et al.
Published: (2026)
by: Yang, Haiyu, et al.
Published: (2026)
Evaluating OCR performance on food packaging labels in South Africa
by: Nagayi, Mayimunah, et al.
Published: (2025)
by: Nagayi, Mayimunah, et al.
Published: (2025)
Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
by: Xu, Yiwen, et al.
Published: (2026)
by: Xu, Yiwen, et al.
Published: (2026)
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language
by: Haq, Ijazul, et al.
Published: (2025)
by: Haq, Ijazul, et al.
Published: (2025)
QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation
by: Wasfy, Ahmed, et al.
Published: (2025)
by: Wasfy, Ahmed, et al.
Published: (2025)
Confidence-Aware Document OCR Error Detection
by: Hemmer, Arthur, et al.
Published: (2024)
by: Hemmer, Arthur, et al.
Published: (2024)
KAN See In the Dark
by: Ning, Aoxiang, et al.
Published: (2024)
by: Ning, Aoxiang, et al.
Published: (2024)
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval
by: Most, Alexander, et al.
Published: (2025)
by: Most, Alexander, et al.
Published: (2025)
Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems
by: Faraz, Ali, et al.
Published: (2026)
by: Faraz, Ali, et al.
Published: (2026)
VB: Visibility Benchmark for Visibility and Perspective Reasoning in Images
by: Tripathi, Neil
Published: (2026)
by: Tripathi, Neil
Published: (2026)
Deploying and Evaluating Multiple Deep Learning Models on Edge Devices for Diabetic Retinopathy Detection
by: Asare, Akwasi, et al.
Published: (2025)
by: Asare, Akwasi, et al.
Published: (2025)
Sign Spotting Disambiguation using Large Language Models
by: Low, JianHe, et al.
Published: (2025)
by: Low, JianHe, et al.
Published: (2025)
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens
by: Yu, Ya-Qi, et al.
Published: (2024)
by: Yu, Ya-Qi, et al.
Published: (2024)
Multimodal Language Models See Better When They Look Shallower
by: Chen, Haoran, et al.
Published: (2025)
by: Chen, Haoran, et al.
Published: (2025)
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
by: Talon, Davide, et al.
Published: (2025)
by: Talon, Davide, et al.
Published: (2025)
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
by: Shen, Zhixuan, et al.
Published: (2024)
by: Shen, Zhixuan, et al.
Published: (2024)
Automated Wicket-Taking Delivery Segmentation and Trajectory-Based Dismissal-Zone Analysis in Cricket Videos Using OCR-Guided YOLOv8
by: Karmoker, Joy, et al.
Published: (2025)
by: Karmoker, Joy, et al.
Published: (2025)
SignMouth: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion
by: Wu, Wenfang, et al.
Published: (2025)
by: Wu, Wenfang, et al.
Published: (2025)
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
by: Huang, Zheng, et al.
Published: (2025)
by: Huang, Zheng, et al.
Published: (2025)
Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR
by: Zhong, Yufeng, et al.
Published: (2025)
by: Zhong, Yufeng, et al.
Published: (2025)
Seeing the Intangible: Survey of Image Classification into High-Level and Abstract Categories
by: Pandiani, Delfina Sol Martinez, et al.
Published: (2023)
by: Pandiani, Delfina Sol Martinez, et al.
Published: (2023)
Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR
by: Shu, Jing, et al.
Published: (2024)
by: Shu, Jing, et al.
Published: (2024)
Modeling Intensification for Sign Language Generation: A Computational Approach
by: İnan, Mert, et al.
Published: (2022)
by: İnan, Mert, et al.
Published: (2022)
Similar Items
-
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
by: Momayiz, Imane, et al.
Published: (2026) -
Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025) -
OCR-Quality: A Human-Annotated Dataset for OCR Quality Assessment
by: Zhang, Yulong
Published: (2025) -
HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025) -
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)