Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Han, Minghao, Yang, Dingkang, Qu, Linhao, Chen, Zizhi, Li, Gang, Wang, Han, Wang, Jiacong, Zhang, Lihua
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.13944
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915799469391872
author	Han, Minghao Yang, Dingkang Qu, Linhao Chen, Zizhi Li, Gang Wang, Han Wang, Jiacong Zhang, Lihua
author_facet	Han, Minghao Yang, Dingkang Qu, Linhao Chen, Zizhi Li, Gang Wang, Han Wang, Jiacong Zhang, Lihua
contents	Recent years have witnessed remarkable progress in multimodal learning within computational pathology. Existing models primarily rely on vision and language modalities; however, language alone lacks molecular specificity and offers limited pathological supervision, leading to representational bottlenecks. In this paper, we propose STAMP, a Spatial Transcriptomics-Augmented Multimodal Pathology representation learning framework that integrates spatially-resolved gene expression profiles to enable molecule-guided joint embedding of pathology images and transcriptomic data. Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations. Incorporating spatial context and multi-scale information further enhances model performance and generalizability. To support this, we constructed SpaVis-6M, the largest Visium-based spatial transcriptomics dataset to date, and trained a spatially-aware gene encoder on this resource. Leveraging hierarchical multi-scale contrastive alignment and cross-scale patch localization mechanisms, STAMP effectively aligns spatial transcriptomics with pathology images, capturing spatial structure and molecular variation. We validate STAMP across six datasets and four downstream tasks, where it consistently achieves strong performance. These results highlight the value and necessity of integrating spatially resolved molecular supervision for advancing multimodal learning in computational pathology. The code is included in the supplementary materials. The pretrained weights and SpaVis-6M are available at: https://github.com/Hanminghao/STAMP.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_13944
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology Han, Minghao Yang, Dingkang Qu, Linhao Chen, Zizhi Li, Gang Wang, Han Wang, Jiacong Zhang, Lihua Computer Vision and Pattern Recognition Recent years have witnessed remarkable progress in multimodal learning within computational pathology. Existing models primarily rely on vision and language modalities; however, language alone lacks molecular specificity and offers limited pathological supervision, leading to representational bottlenecks. In this paper, we propose STAMP, a Spatial Transcriptomics-Augmented Multimodal Pathology representation learning framework that integrates spatially-resolved gene expression profiles to enable molecule-guided joint embedding of pathology images and transcriptomic data. Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations. Incorporating spatial context and multi-scale information further enhances model performance and generalizability. To support this, we constructed SpaVis-6M, the largest Visium-based spatial transcriptomics dataset to date, and trained a spatially-aware gene encoder on this resource. Leveraging hierarchical multi-scale contrastive alignment and cross-scale patch localization mechanisms, STAMP effectively aligns spatial transcriptomics with pathology images, capturing spatial structure and molecular variation. We validate STAMP across six datasets and four downstream tasks, where it consistently achieves strong performance. These results highlight the value and necessity of integrating spatially resolved molecular supervision for advancing multimodal learning in computational pathology. The code is included in the supplementary materials. The pretrained weights and SpaVis-6M are available at: https://github.com/Hanminghao/STAMP.
title	Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.13944

Similar Items