Saved in:
Bibliographic Details
Main Authors: Ramchandani, Lavish, Tinaikar, Aashay, Das, Dev Kumar, Garg, Rohit, Thomas, Tijo
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.18747
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915810975416320
author Ramchandani, Lavish
Tinaikar, Aashay
Das, Dev Kumar
Garg, Rohit
Thomas, Tijo
author_facet Ramchandani, Lavish
Tinaikar, Aashay
Das, Dev Kumar
Garg, Rohit
Thomas, Tijo
contents In recent years, foundation models such as CLIP, DINO,and CONCH have demonstrated remarkable domain generalization and unsupervised feature extraction capabilities across diverse imaging tasks. However, systematic and independent evaluations of these models for pixel-level semantic segmentation in histopathology remain scarce. In this study, we propose a robust benchmarking approach to asses 10 foundational models on four histopathological datasets covering both morphological tissue-region and cellular/nuclear segmentation tasks. Our method leverages attention maps of foundation models as pixel-wise features, which are then classified using a machine learning algorithm, XGBoost, enabling fast, interpretable, and model-agnostic evaluation without finetuning. We show that the vision language foundation model, CONCH performed the best across datasets when compared to vision-only foundation models, with PathDino as close second. Further analysis shows that models trained on distinct histopathology cohorts capture complementary morphological representations, and concatenating their features yields superior segmentation performance. Concatenating features from CONCH, PathDino and CellViT outperformed individual models across all the datasets by 7.95% (averaged across the datasets), suggesting that ensembles of foundation models can better generalize to diverse histopathological segmentation tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2602_18747
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Benchmarking Computational Pathology Foundation Models For Semantic Segmentation
Ramchandani, Lavish
Tinaikar, Aashay
Das, Dev Kumar
Garg, Rohit
Thomas, Tijo
Computer Vision and Pattern Recognition
In recent years, foundation models such as CLIP, DINO,and CONCH have demonstrated remarkable domain generalization and unsupervised feature extraction capabilities across diverse imaging tasks. However, systematic and independent evaluations of these models for pixel-level semantic segmentation in histopathology remain scarce. In this study, we propose a robust benchmarking approach to asses 10 foundational models on four histopathological datasets covering both morphological tissue-region and cellular/nuclear segmentation tasks. Our method leverages attention maps of foundation models as pixel-wise features, which are then classified using a machine learning algorithm, XGBoost, enabling fast, interpretable, and model-agnostic evaluation without finetuning. We show that the vision language foundation model, CONCH performed the best across datasets when compared to vision-only foundation models, with PathDino as close second. Further analysis shows that models trained on distinct histopathology cohorts capture complementary morphological representations, and concatenating their features yields superior segmentation performance. Concatenating features from CONCH, PathDino and CellViT outperformed individual models across all the datasets by 7.95% (averaged across the datasets), suggesting that ensembles of foundation models can better generalize to diverse histopathological segmentation tasks.
title Benchmarking Computational Pathology Foundation Models For Semantic Segmentation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.18747