Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Chenran, Wu, Ruiqi, Zhou, Tao, Zhou, Yi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.09101
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908875650760704
author	Zhang, Chenran Wu, Ruiqi Zhou, Tao Zhou, Yi
author_facet	Zhang, Chenran Wu, Ruiqi Zhou, Tao Zhou, Yi
contents	Medical vision-language pretraining (VLP) models have recently been investigated for their generalization to diverse downstream tasks. However, current medical VLP methods typically force the model to learn simple and complex concepts simultaneously. This anti-cognitive process leads to suboptimal feature representations, especially under distribution shift. To address this limitation, we propose a Knowledge-driven Cognitive Orchestration for Medical VLP (MedKCO) that involves both the ordering of the pretraining data and the learning objective of vision-language contrast. Specifically, we design a two level curriculum by incorporating diagnostic sensitivity and intra-class sample representativeness for the ordering of the pretraining data. Moreover, considering the inter-class similarity of medical images, we introduce a self-paced asymmetric contrastive loss to dynamically adjust the participation of the pretraining objective. We evaluate the proposed pretraining method on three medical imaging scenarios in multiple vision-language downstream tasks, and compare it with several curriculum learning methods. Extensive experiments show that our method significantly surpasses all baselines. https://github.com/Mr-Talon/MedKCO.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_09101
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration Zhang, Chenran Wu, Ruiqi Zhou, Tao Zhou, Yi Computer Vision and Pattern Recognition Medical vision-language pretraining (VLP) models have recently been investigated for their generalization to diverse downstream tasks. However, current medical VLP methods typically force the model to learn simple and complex concepts simultaneously. This anti-cognitive process leads to suboptimal feature representations, especially under distribution shift. To address this limitation, we propose a Knowledge-driven Cognitive Orchestration for Medical VLP (MedKCO) that involves both the ordering of the pretraining data and the learning objective of vision-language contrast. Specifically, we design a two level curriculum by incorporating diagnostic sensitivity and intra-class sample representativeness for the ordering of the pretraining data. Moreover, considering the inter-class similarity of medical images, we introduce a self-paced asymmetric contrastive loss to dynamically adjust the participation of the pretraining objective. We evaluate the proposed pretraining method on three medical imaging scenarios in multiple vision-language downstream tasks, and compare it with several curriculum learning methods. Extensive experiments show that our method significantly surpasses all baselines. https://github.com/Mr-Talon/MedKCO.
title	MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.09101

Similar Items