Saved in:
Bibliographic Details
Main Authors: Wang, Jun, Mao, Yu, Guan, Nan, Xue, Chun Jason
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.01408
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914962641780736
author Wang, Jun
Mao, Yu
Guan, Nan
Xue, Chun Jason
author_facet Wang, Jun
Mao, Yu
Guan, Nan
Xue, Chun Jason
contents The multimodal model has demonstrated promise in histopathology. However, most multimodal models are based on H\&E and genomics, adopting increasingly complex yet black-box designs. In our paper, we propose a novel interpretable multimodal framework named SHAP-CAT, which uses a Shapley-value-based dimension reduction technique for effective multimodal fusion. Starting with two paired modalities -- H\&E and IHC images, we employ virtual staining techniques to enhance limited input data by generating a new clinical-related modality. Lightweight bag-level representations are extracted from image modalities and a Shapley-value-based mechanism is used for dimension reduction. For each dimension of the bag-level representation, attribution values are calculated to indicate how changes in the specific dimensions of the input affect the model output. In this way, we select a few top important dimensions of bag-level representation for each image modality to late fusion. Our experimental results demonstrate that the proposed SHAP-CAT framework incorporating synthetic modalities significantly enhances model performance, yielding a 5\% increase in accuracy for the BCI, an 8\% increase for IHC4BC-ER, and an 11\% increase for the IHC4BC-PR dataset.
format Preprint
id arxiv_https___arxiv_org_abs_2410_01408
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle SHAP-CAT: A interpretable multi-modal framework enhancing WSI classification via virtual staining and shapley-value-based multimodal fusion
Wang, Jun
Mao, Yu
Guan, Nan
Xue, Chun Jason
Computer Vision and Pattern Recognition
The multimodal model has demonstrated promise in histopathology. However, most multimodal models are based on H\&E and genomics, adopting increasingly complex yet black-box designs. In our paper, we propose a novel interpretable multimodal framework named SHAP-CAT, which uses a Shapley-value-based dimension reduction technique for effective multimodal fusion. Starting with two paired modalities -- H\&E and IHC images, we employ virtual staining techniques to enhance limited input data by generating a new clinical-related modality. Lightweight bag-level representations are extracted from image modalities and a Shapley-value-based mechanism is used for dimension reduction. For each dimension of the bag-level representation, attribution values are calculated to indicate how changes in the specific dimensions of the input affect the model output. In this way, we select a few top important dimensions of bag-level representation for each image modality to late fusion. Our experimental results demonstrate that the proposed SHAP-CAT framework incorporating synthetic modalities significantly enhances model performance, yielding a 5\% increase in accuracy for the BCI, an 8\% increase for IHC4BC-ER, and an 11\% increase for the IHC4BC-PR dataset.
title SHAP-CAT: A interpretable multi-modal framework enhancing WSI classification via virtual staining and shapley-value-based multimodal fusion
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2410.01408