Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Luo, Xiaoliu, Xiao, Minxue, Xie, Ting, Wang, Mengzhu, Qi, Huiqing, Zhou, Joey Tianyi, Zhang, Taiping, Wang, Xu
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.23977
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908995042672640
author	Luo, Xiaoliu Xiao, Minxue Xie, Ting Wang, Mengzhu Qi, Huiqing Zhou, Joey Tianyi Zhang, Taiping Wang, Xu
author_facet	Luo, Xiaoliu Xiao, Minxue Xie, Ting Wang, Mengzhu Qi, Huiqing Zhou, Joey Tianyi Zhang, Taiping Wang, Xu
contents	Accurate biomedical image classification under low-resource conditions remains challenging due to limited annotations, subtle inter-class visual differences, and complex disease semantics. While vision--language models offer a promising foundation for mitigating data scarcity, their effective adaptation in biomedical settings is constrained by the need for parameter-efficient tuning alongside fine-grained and semantically consistent representation learning. In this work, we propose Multi-View Synergistic Learning (MVSL), a unified framework that addresses these challenges by jointly considering adaptation paradigms, representation granularity, and disease semantic relationships. MVSL decouples the adaptation of visual and textual encoders to respect their distinct representational characteristics, enabling more stable and effective parameter-efficient fine-tuning. It further introduces multi-granularity contrastive learning to explicitly model both global image semantics and localized lesion-level evidence, improving fine-grained discrimination for visually similar disease categories. In addition, MVSL preserves disease-level semantic structure by incorporating structured supervision derived from large language models, which constrains textual representations at the class level and indirectly regularizes visual embeddings through cross-modal alignment. Together, these components enable more stable cross-modal alignment and improved discrimination under limited supervision. Extensive experiments on $11$ public biomedical datasets spanning $9$ imaging modalities and $10$ anatomical regions demonstrate that MVSL consistently outperforms state-of-the-art methods in few-shot and zero-shot classification settings.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_23977
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Multi-View Synergistic Learning with Vision-Language Adaption for Low-Resource Biomedical Image Classification Luo, Xiaoliu Xiao, Minxue Xie, Ting Wang, Mengzhu Qi, Huiqing Zhou, Joey Tianyi Zhang, Taiping Wang, Xu Computer Vision and Pattern Recognition Accurate biomedical image classification under low-resource conditions remains challenging due to limited annotations, subtle inter-class visual differences, and complex disease semantics. While vision--language models offer a promising foundation for mitigating data scarcity, their effective adaptation in biomedical settings is constrained by the need for parameter-efficient tuning alongside fine-grained and semantically consistent representation learning. In this work, we propose Multi-View Synergistic Learning (MVSL), a unified framework that addresses these challenges by jointly considering adaptation paradigms, representation granularity, and disease semantic relationships. MVSL decouples the adaptation of visual and textual encoders to respect their distinct representational characteristics, enabling more stable and effective parameter-efficient fine-tuning. It further introduces multi-granularity contrastive learning to explicitly model both global image semantics and localized lesion-level evidence, improving fine-grained discrimination for visually similar disease categories. In addition, MVSL preserves disease-level semantic structure by incorporating structured supervision derived from large language models, which constrains textual representations at the class level and indirectly regularizes visual embeddings through cross-modal alignment. Together, these components enable more stable cross-modal alignment and improved discrimination under limited supervision. Extensive experiments on $11$ public biomedical datasets spanning $9$ imaging modalities and $10$ anatomical regions demonstrate that MVSL consistently outperforms state-of-the-art methods in few-shot and zero-shot classification settings.
title	Multi-View Synergistic Learning with Vision-Language Adaption for Low-Resource Biomedical Image Classification
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.23977

Similar Items