Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20347 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912874397433856 |
|---|---|
| author | She, Chengying Chen, Chengwei Zhang, Xinran Wang, Ben Liu, Lizhuang Shao, Chengwei Bian, Yun |
| author_facet | She, Chengying Chen, Chengwei Zhang, Xinran Wang, Ben Liu, Lizhuang Shao, Chengwei Bian, Yun |
| contents | Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_20347 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis She, Chengying Chen, Chengwei Zhang, Xinran Wang, Ben Liu, Lizhuang Shao, Chengwei Bian, Yun Computer Vision and Pattern Recognition Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives. |
| title | MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2601.20347 |