Saved in:
Bibliographic Details
Main Authors: Das, Aryan, Rachamalla, Tanishq, Biswas, Koushik, Roy, Swalpa Kumar, Verma, Vinay Kumar
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.14498
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910027672977408
author Das, Aryan
Rachamalla, Tanishq
Biswas, Koushik
Roy, Swalpa Kumar
Verma, Vinay Kumar
author_facet Das, Aryan
Rachamalla, Tanishq
Biswas, Koushik
Roy, Swalpa Kumar
Verma, Vinay Kumar
contents We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS
format Preprint
id arxiv_https___arxiv_org_abs_2602_14498
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
Das, Aryan
Rachamalla, Tanishq
Biswas, Koushik
Roy, Swalpa Kumar
Verma, Vinay Kumar
Computer Vision and Pattern Recognition
Machine Learning
We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS
title Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2602.14498