Saved in:
Bibliographic Details
Main Authors: Kasireddy, Harishwar Reddy, La Rosa, Patricio S., Gupta, Akshita, Paul, Anindya S., Fermin, Jamie L., Clapp, William L., Waldman, Meryl A., El-Ashkar, Tarek M., Jain, Sanjay, Rodrigues, Luis, Jen, Kuang Yu, Rosenberg, Avi Z., Eadon, Michael T., Hodgin, Jeffrey B., Sarder, Pinaki
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.15967
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Histopathology foundation models (HFMs), pretrained on large-scale cancer datasets, have advanced computational pathology. However, their applicability to non-cancerous chronic kidney disease remains underexplored, despite coexistence of renal pathology with malignancies such as renal cell and urothelial carcinoma. We systematically evaluate 11 publicly available HFMs across 11 kidney-specific downstream tasks spanning multiple stains (PAS, H&E, PASM, and IHC), spatial scales (tile and slide-level), task types (classification, regression, and copy detection), and clinical objectives, including detection, diagnosis, and prognosis. Tile-level performance is assessed using repeated stratified group cross-validation, while slide-level tasks are evaluated using repeated nested stratified cross-validation. Statistical significance is examined using Friedman test followed by pairwise Wilcoxon signed-rank testing with Holm-Bonferroni correction and compact letter display visualization. To promote reproducibility, we release an open-source Python package, kidney-hfm-eval, available at https://pypi.org/project/kidney-hfm-eval/ , that reproduces the evaluation pipelines. Results show moderate to strong performance on tasks driven by coarse meso-scale renal morphology, including diagnostic classification and detection of prominent structural alterations. In contrast, performance consistently declines for tasks requiring fine-grained microstructural discrimination, complex biological phenotypes, or slide-level prognostic inference, largely independent of stain type. Overall, current HFMs appear to encode predominantly static meso-scale representations and may have limited capacity to capture subtle renal pathology or prognosis-related signals. Our results highlight the need for kidney-specific, multi-stain, and multimodal foundation models to support clinically reliable decision-making in nephrology.