Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Imran, Muhammad, Lee, Chi, Lee, Yugyung
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence 68T45, 68U10, 92C55 I.2.10; I.4.8; H.2.8; J.3
Online Access:	https://arxiv.org/abs/2601.11666
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911381934047232
author	Imran, Muhammad Lee, Chi Lee, Yugyung
author_facet	Imran, Muhammad Lee, Chi Lee, Yugyung
contents	We introduce MATEX (Multi-scale Attention and Text-guided Explainability), a novel framework that advances interpretability in medical vision-language models by incorporating anatomically informed spatial reasoning. MATEX synergistically combines multi-layer attention rollout, text-guided spatial priors, and layer consistency analysis to produce precise, stable, and clinically meaningful gradient attribution maps. By addressing key limitations of prior methods, such as spatial imprecision, lack of anatomical grounding, and limited attention granularity, MATEX enables more faithful and interpretable model explanations. Evaluated on the MS-CXR dataset, MATEX outperforms the state-of-the-art M2IB approach in both spatial precision and alignment with expert-annotated findings. These results highlight MATEX's potential to enhance trust and transparency in radiological AI applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_11666
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	MATEX: Multi-scale Attention and Text-guided Explainability of Medical Vision-Language Models Imran, Muhammad Lee, Chi Lee, Yugyung Computer Vision and Pattern Recognition Artificial Intelligence 68T45, 68U10, 92C55 I.2.10; I.4.8; H.2.8; J.3 We introduce MATEX (Multi-scale Attention and Text-guided Explainability), a novel framework that advances interpretability in medical vision-language models by incorporating anatomically informed spatial reasoning. MATEX synergistically combines multi-layer attention rollout, text-guided spatial priors, and layer consistency analysis to produce precise, stable, and clinically meaningful gradient attribution maps. By addressing key limitations of prior methods, such as spatial imprecision, lack of anatomical grounding, and limited attention granularity, MATEX enables more faithful and interpretable model explanations. Evaluated on the MS-CXR dataset, MATEX outperforms the state-of-the-art M2IB approach in both spatial precision and alignment with expert-annotated findings. These results highlight MATEX's potential to enhance trust and transparency in radiological AI applications.
title	MATEX: Multi-scale Attention and Text-guided Explainability of Medical Vision-Language Models
topic	Computer Vision and Pattern Recognition Artificial Intelligence 68T45, 68U10, 92C55 I.2.10; I.4.8; H.2.8; J.3
url	https://arxiv.org/abs/2601.11666

Similar Items