Registo fonte: :: Library Catalog

Na minha lista:

Detalhes bibliográficos
Main Authors:	Manzari, Omid Nejati, Asgariandehkordi, Hojat, Koleilat, Taha, Xiao, Yiming, Rivaz, Hassan
Formato:	Preprint
Publicado em:	2026
Assuntos:	Computer Vision and Pattern Recognition
Acesso em linha:	https://arxiv.org/abs/2604.01310
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

_version_	1866915906730328064
author	Manzari, Omid Nejati Asgariandehkordi, Hojat Koleilat, Taha Xiao, Yiming Rivaz, Hassan
author_facet	Manzari, Omid Nejati Asgariandehkordi, Hojat Koleilat, Taha Xiao, Yiming Rivaz, Hassan
contents	Large vision-language models (VLMs) excel on general benchmarks but often lack robustness in medical imaging, where heterogeneous supervision induces cross-dataset interference and sensitivity to data regime (i.e., how the supervisory signals are mixed). In realistic clinical workflows, data and tasks arrive sequentially, so naive continual training further leads to catastrophic forgetting. To address these challenges, we propose MedQwen, a parameter-efficient medical VLM that couples a spectrally routed Mixture-of-Experts (MoE) with a theoretically grounded scaling rule that aligns low-rank updates with a full-rank, fully fine-tuned MoE, without changing the base architecture. Concretely, we initialize each expert from non-overlapping singular value decomposition (SVD) segments of the pretrained weight and introduce a residual compensation and scaling scheme to enable stable expert specialization and consistent routing under distribution shift. Across 23 medical datasets covering visual question answering, report generation, radiology classification, and hallucination mitigation, MedQwen achieves strong, reliable performance: it approaches full fine-tuning on zero-shot classification with 339$\times$ fewer trainable parameters, and reduces sequential forgetting to $\sim$5\% where strong baselines degrade by $>$20-50\%.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_01310
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Sparse Spectral LoRA: Routed Experts for Medical VLMs Manzari, Omid Nejati Asgariandehkordi, Hojat Koleilat, Taha Xiao, Yiming Rivaz, Hassan Computer Vision and Pattern Recognition Large vision-language models (VLMs) excel on general benchmarks but often lack robustness in medical imaging, where heterogeneous supervision induces cross-dataset interference and sensitivity to data regime (i.e., how the supervisory signals are mixed). In realistic clinical workflows, data and tasks arrive sequentially, so naive continual training further leads to catastrophic forgetting. To address these challenges, we propose MedQwen, a parameter-efficient medical VLM that couples a spectrally routed Mixture-of-Experts (MoE) with a theoretically grounded scaling rule that aligns low-rank updates with a full-rank, fully fine-tuned MoE, without changing the base architecture. Concretely, we initialize each expert from non-overlapping singular value decomposition (SVD) segments of the pretrained weight and introduce a residual compensation and scaling scheme to enable stable expert specialization and consistent routing under distribution shift. Across 23 medical datasets covering visual question answering, report generation, radiology classification, and hallucination mitigation, MedQwen achieves strong, reliable performance: it approaches full fine-tuning on zero-shot classification with 339$\times$ fewer trainable parameters, and reduces sequential forgetting to $\sim$5\% where strong baselines degrade by $>$20-50\%.
title	Sparse Spectral LoRA: Routed Experts for Medical VLMs
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.01310

Registos relacionados