Saved in:
Bibliographic Details
Main Authors: Wang, Wenjie, Wu, Wei, Liu, Ying, Zhao, Yuan, Lv, Xiaole, Diao, Liang, Fan, Zengjian, Xie, Wenfeng, Lin, Ziling, Shi, De, Huang, Lin, Xu, Kaihe, Li, Hong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.06402
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Medical document OCR is challenging due to complex layouts, domain-specific terminology, and noisy annotations, while requiring strict field-level exact matching. Existing OCR systems and general-purpose vision-language models often fail to reliably parse such documents. We propose MeDocVL, a post-trained vision-language model for query-driven medical document parsing. Our framework combines Training-driven Label Refinement to construct high-quality supervision from noisy annotations, with a Noise-aware Hybrid Post-training strategy that integrates reinforcement learning and supervised fine-tuning to achieve robust and precise extraction. Experiments on medical invoice benchmarks show that MeDocVL consistently outperforms conventional OCR systems and strong VLM baselines, achieving state-of-the-art performance under noisy supervision.