Guardado en:
| Autores principales: | Chung, Yan Hon Michael, Choi, Donghyeok |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2507.06761 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Anchor-based Robust Finetuning of Vision-Language Models
por: Han, Jinwei, et al.
Publicado: (2024)
por: Han, Jinwei, et al.
Publicado: (2024)
Ocean-OCR: Towards General OCR Application via a Vision-Language Model
por: Chen, Song, et al.
Publicado: (2025)
por: Chen, Song, et al.
Publicado: (2025)
Error Patterns in Historical OCR: A Comparative Analysis of TrOCR and a Vision-Language Model
por: Vesalainen, Ari, et al.
Publicado: (2026)
por: Vesalainen, Ari, et al.
Publicado: (2026)
RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages
por: Kashid, Harshvivek, et al.
Publicado: (2024)
por: Kashid, Harshvivek, et al.
Publicado: (2024)
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
por: Taghadouini, Said, et al.
Publicado: (2026)
por: Taghadouini, Said, et al.
Publicado: (2026)
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
por: Hennara, Khalil, et al.
Publicado: (2025)
por: Hennara, Khalil, et al.
Publicado: (2025)
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
por: Zhong, Yufeng, et al.
Publicado: (2026)
por: Zhong, Yufeng, et al.
Publicado: (2026)
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
por: Momayiz, Imane, et al.
Publicado: (2026)
por: Momayiz, Imane, et al.
Publicado: (2026)
KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
por: Gagnier, Henry, et al.
Publicado: (2026)
por: Gagnier, Henry, et al.
Publicado: (2026)
Online In-Context Distillation for Low-Resource Vision Language Models
por: Kang, Zhiqi, et al.
Publicado: (2025)
por: Kang, Zhiqi, et al.
Publicado: (2025)
From Plausibility to Verifiability: Risk-Controlled Generative OCR with Vision-Language Models
por: Gong, Weile, et al.
Publicado: (2026)
por: Gong, Weile, et al.
Publicado: (2026)
CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models
por: Jha, Saurav, et al.
Publicado: (2024)
por: Jha, Saurav, et al.
Publicado: (2024)
DianJin-OCR-R1: Enhancing OCR Capabilities via a Reasoning-and-Tool Interleaved Vision-Language Model
por: Chen, Qian, et al.
Publicado: (2025)
por: Chen, Qian, et al.
Publicado: (2025)
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
por: Zang, Yuhang, et al.
Publicado: (2024)
por: Zang, Yuhang, et al.
Publicado: (2024)
OmniOCR: Generalist OCR for Ethnic Minority Languages
por: Liu, Bonan, et al.
Publicado: (2026)
por: Liu, Bonan, et al.
Publicado: (2026)
Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages
por: Salmè, Marco, et al.
Publicado: (2025)
por: Salmè, Marco, et al.
Publicado: (2025)
Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail
por: Lamm, Bianca, et al.
Publicado: (2024)
por: Lamm, Bianca, et al.
Publicado: (2024)
Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations
por: Mitra, Chancharik, et al.
Publicado: (2025)
por: Mitra, Chancharik, et al.
Publicado: (2025)
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models
por: Xu, Longwei, et al.
Publicado: (2026)
por: Xu, Longwei, et al.
Publicado: (2026)
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
por: Wang, Haoxuan, et al.
Publicado: (2024)
por: Wang, Haoxuan, et al.
Publicado: (2024)
STELLAR: Scene Text Editor for Low-Resource Languages and Real-World Data
por: Seo, Yongdeuk, et al.
Publicado: (2025)
por: Seo, Yongdeuk, et al.
Publicado: (2025)
Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR
por: Shu, Jing, et al.
Publicado: (2024)
por: Shu, Jing, et al.
Publicado: (2024)
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
por: Baek, Ingeol, et al.
Publicado: (2025)
por: Baek, Ingeol, et al.
Publicado: (2025)
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
por: Wang, Weizhi, et al.
Publicado: (2024)
por: Wang, Weizhi, et al.
Publicado: (2024)
The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models
por: Chopra, Shivang, et al.
Publicado: (2026)
por: Chopra, Shivang, et al.
Publicado: (2026)
PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks
por: Cui, Cheng, et al.
Publicado: (2026)
por: Cui, Cheng, et al.
Publicado: (2026)
Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models
por: Horawalavithana, Sameera, et al.
Publicado: (2026)
por: Horawalavithana, Sameera, et al.
Publicado: (2026)
Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques
por: K., Abhinand, et al.
Publicado: (2024)
por: K., Abhinand, et al.
Publicado: (2024)
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
por: He, Zhentao, et al.
Publicado: (2025)
por: He, Zhentao, et al.
Publicado: (2025)
A Robust Deep Learning Framework for Bangla License Plate Recognition Using YOLO and Vision-Language OCR
por: Hasin, Nayeb, et al.
Publicado: (2026)
por: Hasin, Nayeb, et al.
Publicado: (2026)
Phantom of Latent for Large Language and Vision Models
por: Lee, Byung-Kwan, et al.
Publicado: (2024)
por: Lee, Byung-Kwan, et al.
Publicado: (2024)
Improving OCR for Historical Texts of Multiple Languages
por: Westerdijk, Hylke, et al.
Publicado: (2025)
por: Westerdijk, Hylke, et al.
Publicado: (2025)
BehaviorVLM: Unified Finetuning-Free Behavioral Understanding with Vision-Language Reasoning
por: Ke, Jingyang, et al.
Publicado: (2026)
por: Ke, Jingyang, et al.
Publicado: (2026)
VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving
por: Xu, Zhefan, et al.
Publicado: (2026)
por: Xu, Zhefan, et al.
Publicado: (2026)
GutenOCR: A Grounded Vision-Language Front-End for Documents
por: Heidenreich, Hunter, et al.
Publicado: (2026)
por: Heidenreich, Hunter, et al.
Publicado: (2026)
Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives
por: Park, Ji-jun, et al.
Publicado: (2024)
por: Park, Ji-jun, et al.
Publicado: (2024)
Low-Resource Vision Challenges for Foundation Models
por: Zhang, Yunhua, et al.
Publicado: (2024)
por: Zhang, Yunhua, et al.
Publicado: (2024)
The Spatial Blindspot of Vision-Language Models
por: Alam, Nahid, et al.
Publicado: (2026)
por: Alam, Nahid, et al.
Publicado: (2026)
Artwork Interpretation with Vision Language Models: A Case Study on Emotions and Emotion Symbols
por: Padó, Sebastian, et al.
Publicado: (2025)
por: Padó, Sebastian, et al.
Publicado: (2025)
MVT: Mask-Grounded Vision-Language Models for Taxonomy-Aligned Land-Cover Tagging
por: Chen, Siyi, et al.
Publicado: (2025)
por: Chen, Siyi, et al.
Publicado: (2025)
Ejemplares similares
-
Anchor-based Robust Finetuning of Vision-Language Models
por: Han, Jinwei, et al.
Publicado: (2024) -
Ocean-OCR: Towards General OCR Application via a Vision-Language Model
por: Chen, Song, et al.
Publicado: (2025) -
Error Patterns in Historical OCR: A Comparative Analysis of TrOCR and a Vision-Language Model
por: Vesalainen, Ari, et al.
Publicado: (2026) -
RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages
por: Kashid, Harshvivek, et al.
Publicado: (2024) -
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
por: Taghadouini, Said, et al.
Publicado: (2026)