Saved in:
Bibliographic Details
Main Authors: Imran, Muhammad, Lee, Chi, Lee, Yugyung
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.11666
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911381934047232
author Imran, Muhammad
Lee, Chi
Lee, Yugyung
author_facet Imran, Muhammad
Lee, Chi
Lee, Yugyung
contents We introduce MATEX (Multi-scale Attention and Text-guided Explainability), a novel framework that advances interpretability in medical vision-language models by incorporating anatomically informed spatial reasoning. MATEX synergistically combines multi-layer attention rollout, text-guided spatial priors, and layer consistency analysis to produce precise, stable, and clinically meaningful gradient attribution maps. By addressing key limitations of prior methods, such as spatial imprecision, lack of anatomical grounding, and limited attention granularity, MATEX enables more faithful and interpretable model explanations. Evaluated on the MS-CXR dataset, MATEX outperforms the state-of-the-art M2IB approach in both spatial precision and alignment with expert-annotated findings. These results highlight MATEX's potential to enhance trust and transparency in radiological AI applications.
format Preprint
id arxiv_https___arxiv_org_abs_2601_11666
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle MATEX: Multi-scale Attention and Text-guided Explainability of Medical Vision-Language Models
Imran, Muhammad
Lee, Chi
Lee, Yugyung
Computer Vision and Pattern Recognition
Artificial Intelligence
68T45, 68U10, 92C55
I.2.10; I.4.8; H.2.8; J.3
We introduce MATEX (Multi-scale Attention and Text-guided Explainability), a novel framework that advances interpretability in medical vision-language models by incorporating anatomically informed spatial reasoning. MATEX synergistically combines multi-layer attention rollout, text-guided spatial priors, and layer consistency analysis to produce precise, stable, and clinically meaningful gradient attribution maps. By addressing key limitations of prior methods, such as spatial imprecision, lack of anatomical grounding, and limited attention granularity, MATEX enables more faithful and interpretable model explanations. Evaluated on the MS-CXR dataset, MATEX outperforms the state-of-the-art M2IB approach in both spatial precision and alignment with expert-annotated findings. These results highlight MATEX's potential to enhance trust and transparency in radiological AI applications.
title MATEX: Multi-scale Attention and Text-guided Explainability of Medical Vision-Language Models
topic Computer Vision and Pattern Recognition
Artificial Intelligence
68T45, 68U10, 92C55
I.2.10; I.4.8; H.2.8; J.3
url https://arxiv.org/abs/2601.11666