:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Szankin, Maciej, Venkatasamy, Vidhyananth, Ying, Lihang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.11730
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
by: Momayiz, Imane, et al.
Published: (2026)

Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models
by: Nigam, Shubham Kumar, et al.
Published: (2025)

OCR-Quality: A Human-Annotated Dataset for OCR Quality Assessment
by: Zhang, Yulong
Published: (2025)

HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025)

DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
by: Cardoso, Gabriel Pimenta de Freitas, et al.
Published: (2026)

From Sora What We Can See: A Survey of Text-to-Video Generation
by: Sun, Rui, et al.
Published: (2024)

CFIS-YOLO: A Lightweight Multi-Scale Fusion Network for Edge-Deployable Wood Defect Detection
by: Kang, Jincheng, et al.
Published: (2025)

Vision-Language Models for Edge Networks: A Comprehensive Survey
by: Sharshar, Ahmed, et al.
Published: (2025)

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
by: Liu, Yuliang, et al.
Published: (2024)

On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
by: Huang, Lianming, et al.
Published: (2025)

Towards Efficient Image Deblurring for Edge Deployment
by: Miriyala, Srinivas, et al.
Published: (2026)

Large Sign Language Models: Toward 3D American Sign Language Translation
by: Zhang, Sen, et al.
Published: (2025)

Automated Invoice Data Extraction: Using LLM and OCR
by: Khanchandani, Khushi, et al.
Published: (2025)

InstructOCR: Instruction Boosting Scene Text Spotting
by: Duan, Chen, et al.
Published: (2024)

Generation and Detection of Sign Language Deepfakes - A Linguistic and Visual Analysis
by: Naeem, Shahzeb, et al.
Published: (2024)

From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety
by: Sethupathy, Ganen, et al.
Published: (2026)

Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment
by: Wu, Jiaqi, et al.
Published: (2024)

Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics
by: Yang, Haiyu, et al.
Published: (2026)

Evaluating OCR performance on food packaging labels in South Africa
by: Nagayi, Mayimunah, et al.
Published: (2025)

Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
by: Xu, Yiwen, et al.
Published: (2026)

PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language
by: Haq, Ijazul, et al.
Published: (2025)

QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation
by: Wasfy, Ahmed, et al.
Published: (2025)

Confidence-Aware Document OCR Error Detection
by: Hemmer, Arthur, et al.
Published: (2024)

KAN See In the Dark
by: Ning, Aoxiang, et al.
Published: (2024)

Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval
by: Most, Alexander, et al.
Published: (2025)

Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems
by: Faraz, Ali, et al.
Published: (2026)

VB: Visibility Benchmark for Visibility and Perspective Reasoning in Images
by: Tripathi, Neil
Published: (2026)

Deploying and Evaluating Multiple Deep Learning Models on Edge Devices for Diabetic Retinopathy Detection
by: Asare, Akwasi, et al.
Published: (2025)

Sign Spotting Disambiguation using Large Language Models
by: Low, JianHe, et al.
Published: (2025)

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens
by: Yu, Ya-Qi, et al.
Published: (2024)

Multimodal Language Models See Better When They Look Shallower
by: Chen, Haoran, et al.
Published: (2025)

Seeing the Abstract: Translating the Abstract Language for Vision Language Models
by: Talon, Davide, et al.
Published: (2025)

Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
by: Shen, Zhixuan, et al.
Published: (2024)

Automated Wicket-Taking Delivery Segmentation and Trajectory-Based Dismissal-Zone Analysis in Cricket Videos Using OCR-Guided YOLOv8
by: Karmoker, Joy, et al.
Published: (2025)

SignMouth: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion
by: Wu, Wenfang, et al.
Published: (2025)

Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
by: Huang, Zheng, et al.
Published: (2025)

Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR
by: Zhong, Yufeng, et al.
Published: (2025)

Seeing the Intangible: Survey of Image Classification into High-Level and Abstract Categories
by: Pandiani, Delfina Sol Martinez, et al.
Published: (2023)

Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR
by: Shu, Jing, et al.
Published: (2024)

Modeling Intensification for Sign Language Generation: A Computational Approach
by: İnan, Mert, et al.
Published: (2022)