:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Chung, Yan Hon Michael, Choi, Donghyeok
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2507.06761
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Anchor-based Robust Finetuning of Vision-Language Models
por: Han, Jinwei, et al.
Publicado: (2024)

Ocean-OCR: Towards General OCR Application via a Vision-Language Model
por: Chen, Song, et al.
Publicado: (2025)

Error Patterns in Historical OCR: A Comparative Analysis of TrOCR and a Vision-Language Model
por: Vesalainen, Ari, et al.
Publicado: (2026)

RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages
por: Kashid, Harshvivek, et al.
Publicado: (2024)

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
por: Taghadouini, Said, et al.
Publicado: (2026)

Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
por: Hennara, Khalil, et al.
Publicado: (2025)

OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
por: Zhong, Yufeng, et al.
Publicado: (2026)

AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
por: Momayiz, Imane, et al.
Publicado: (2026)

KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
por: Gagnier, Henry, et al.
Publicado: (2026)

Online In-Context Distillation for Low-Resource Vision Language Models
por: Kang, Zhiqi, et al.
Publicado: (2025)

From Plausibility to Verifiability: Risk-Controlled Generative OCR with Vision-Language Models
por: Gong, Weile, et al.
Publicado: (2026)

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models
por: Jha, Saurav, et al.
Publicado: (2024)

DianJin-OCR-R1: Enhancing OCR Capabilities via a Reasoning-and-Tool Interleaved Vision-Language Model
por: Chen, Qian, et al.
Publicado: (2025)

Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
por: Zang, Yuhang, et al.
Publicado: (2024)

OmniOCR: Generalist OCR for Ethnic Minority Languages
por: Liu, Bonan, et al.
Publicado: (2026)

Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages
por: Salmè, Marco, et al.
Publicado: (2025)

Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail
por: Lamm, Bianca, et al.
Publicado: (2024)

Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations
por: Mitra, Chancharik, et al.
Publicado: (2025)

Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models
por: Xu, Longwei, et al.
Publicado: (2026)

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
por: Wang, Haoxuan, et al.
Publicado: (2024)

STELLAR: Scene Text Editor for Low-Resource Languages and Real-World Data
por: Seo, Yongdeuk, et al.
Publicado: (2025)

Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR
por: Shu, Jing, et al.
Publicado: (2024)

How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
por: Baek, Ingeol, et al.
Publicado: (2025)

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
por: Wang, Weizhi, et al.
Publicado: (2024)

The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models
por: Chopra, Shivang, et al.
Publicado: (2026)

PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks
por: Cui, Cheng, et al.
Publicado: (2026)

Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models
por: Horawalavithana, Sameera, et al.
Publicado: (2026)

Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques
por: K., Abhinand, et al.
Publicado: (2024)

Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
por: He, Zhentao, et al.
Publicado: (2025)

A Robust Deep Learning Framework for Bangla License Plate Recognition Using YOLO and Vision-Language OCR
por: Hasin, Nayeb, et al.
Publicado: (2026)

Phantom of Latent for Large Language and Vision Models
por: Lee, Byung-Kwan, et al.
Publicado: (2024)

Improving OCR for Historical Texts of Multiple Languages
por: Westerdijk, Hylke, et al.
Publicado: (2025)

BehaviorVLM: Unified Finetuning-Free Behavioral Understanding with Vision-Language Reasoning
por: Ke, Jingyang, et al.
Publicado: (2026)

VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving
por: Xu, Zhefan, et al.
Publicado: (2026)

GutenOCR: A Grounded Vision-Language Front-End for Documents
por: Heidenreich, Hunter, et al.
Publicado: (2026)

Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives
por: Park, Ji-jun, et al.
Publicado: (2024)

Low-Resource Vision Challenges for Foundation Models
por: Zhang, Yunhua, et al.
Publicado: (2024)

The Spatial Blindspot of Vision-Language Models
por: Alam, Nahid, et al.
Publicado: (2026)

Artwork Interpretation with Vision Language Models: A Case Study on Emotions and Emotion Symbols
por: Padó, Sebastian, et al.
Publicado: (2025)

MVT: Mask-Grounded Vision-Language Models for Taxonomy-Aligned Land-Cover Tagging
por: Chen, Siyi, et al.
Publicado: (2025)