Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Das, Aryan, Rachamalla, Tanishq, Biswas, Koushik, Roy, Swalpa Kumar, Verma, Vinay Kumar
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2602.14498
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910027672977408
author	Das, Aryan Rachamalla, Tanishq Biswas, Koushik Roy, Swalpa Kumar Verma, Vinay Kumar
author_facet	Das, Aryan Rachamalla, Tanishq Biswas, Koushik Roy, Swalpa Kumar Verma, Vinay Kumar
contents	We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_14498
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Uncertainty-Aware Vision-Language Segmentation for Medical Imaging Das, Aryan Rachamalla, Tanishq Biswas, Koushik Roy, Swalpa Kumar Verma, Vinay Kumar Computer Vision and Pattern Recognition Machine Learning We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS
title	Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
topic	Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2602.14498

Similar Items