Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Runze, Jia, Shunbo, Lyu, Haonan, Liu, Guo, Liao, Caizhi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Quantitative Methods
Online Access:	https://arxiv.org/abs/2605.09384
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917477902974976
author	Ma, Runze Jia, Shunbo Lyu, Haonan Liu, Guo Liao, Caizhi
author_facet	Ma, Runze Jia, Shunbo Lyu, Haonan Liu, Guo Liao, Caizhi
contents	The reasoning gap between large and compact vision-language models (VLMs) limits the deployment of medical AI on portable clinical devices. Compact VLMs of 2--4B parameters can run on resource-constrained hardware but lack the multi-step reasoning capacity needed for interpretable clinical decision support. Existing knowledge distillation methods transfer answers without the reasoning process behind them. Medical visual question answering (VQA) serves as a testbed for this problem, as it requires models to integrate visual evidence with clinical knowledge through structured reasoning chains. We introduce LiteMedCoT-VL, a pipeline that transfers chain-of-thought reasoning from a 235B teacher model to 2B student models through LoRA-based fine-tuning on explanation-enriched training data. All inference is conducted without image captions by default, simulating the clinical scenario in which a physician interprets a medical image directly without an accompanying radiology report. On the PMC-VQA benchmark, LiteMedCoT-VL achieves 64.9% accuracy, exceeding the zero-shot Qwen3-VL-4B baseline of 53.9% by 11.0 percentage points and outperforming all published baselines. This result indicates that a 2B model with reasoning distillation can match or exceed models with twice the parameters. Visual grounding analysis shows that the model relies on image content rather than exploiting textual priors. Our code is publicly available at https://anonymous.4open.science/r/LiteMedCoT-VL.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_09384
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering Ma, Runze Jia, Shunbo Lyu, Haonan Liu, Guo Liao, Caizhi Computer Vision and Pattern Recognition Artificial Intelligence Quantitative Methods The reasoning gap between large and compact vision-language models (VLMs) limits the deployment of medical AI on portable clinical devices. Compact VLMs of 2--4B parameters can run on resource-constrained hardware but lack the multi-step reasoning capacity needed for interpretable clinical decision support. Existing knowledge distillation methods transfer answers without the reasoning process behind them. Medical visual question answering (VQA) serves as a testbed for this problem, as it requires models to integrate visual evidence with clinical knowledge through structured reasoning chains. We introduce LiteMedCoT-VL, a pipeline that transfers chain-of-thought reasoning from a 235B teacher model to 2B student models through LoRA-based fine-tuning on explanation-enriched training data. All inference is conducted without image captions by default, simulating the clinical scenario in which a physician interprets a medical image directly without an accompanying radiology report. On the PMC-VQA benchmark, LiteMedCoT-VL achieves 64.9% accuracy, exceeding the zero-shot Qwen3-VL-4B baseline of 53.9% by 11.0 percentage points and outperforming all published baselines. This result indicates that a 2B model with reasoning distillation can match or exceed models with twice the parameters. Visual grounding analysis shows that the model relies on image content rather than exploiting textual priors. Our code is publicly available at https://anonymous.4open.science/r/LiteMedCoT-VL.
title	LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering
topic	Computer Vision and Pattern Recognition Artificial Intelligence Quantitative Methods
url	https://arxiv.org/abs/2605.09384

Similar Items