Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	She, Chengying, Chen, Chengwei, Zhang, Xinran, Wang, Ben, Liu, Lizhuang, Shao, Chengwei, Bian, Yun
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.20347
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912874397433856
author	She, Chengying Chen, Chengwei Zhang, Xinran Wang, Ben Liu, Lizhuang Shao, Chengwei Bian, Yun
author_facet	She, Chengying Chen, Chengwei Zhang, Xinran Wang, Ben Liu, Lizhuang Shao, Chengwei Bian, Yun
contents	Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_20347
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis She, Chengying Chen, Chengwei Zhang, Xinran Wang, Ben Liu, Lizhuang Shao, Chengwei Bian, Yun Computer Vision and Pattern Recognition Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives.
title	MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.20347

Similar Items