Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Wenjie, Wu, Wei, Liu, Ying, Zhao, Yuan, Lv, Xiaole, Diao, Liang, Fan, Zengjian, Xie, Wenfeng, Lin, Ziling, Shi, De, Huang, Lin, Xu, Kaihe, Li, Hong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.06402
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Medical document OCR is challenging due to complex layouts, domain-specific terminology, and noisy annotations, while requiring strict field-level exact matching. Existing OCR systems and general-purpose vision-language models often fail to reliably parse such documents. We propose MeDocVL, a post-trained vision-language model for query-driven medical document parsing. Our framework combines Training-driven Label Refinement to construct high-quality supervision from noisy annotations, with a Noise-aware Hybrid Post-training strategy that integrates reinforcement learning and supervised fine-tuning to achieve robust and precise extraction. Experiments on medical invoice benchmarks show that MeDocVL consistently outperforms conventional OCR systems and strong VLM baselines, achieving state-of-the-art performance under noisy supervision.

Similar Items