Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Buckley, Thomas A., Weihrauch, Kian R., Latham, Katherine, Zhou, Andrew Z., Manrai, Padmini A., Manrai, Arjun K.
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2511.19652
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866914169912033280
author Buckley, Thomas A.
Weihrauch, Kian R.
Latham, Katherine
Zhou, Andrew Z.
Manrai, Padmini A.
Manrai, Arjun K.
author_facet Buckley, Thomas A.
Weihrauch, Kian R.
Latham, Katherine
Zhou, Andrew Z.
Manrai, Padmini A.
Manrai, Arjun K.
contents Despite being widely used to support clinical care, general-purpose large multimodal models (LMMs) have generally shown poor or inconclusive performance in medical image interpretation, particularly in pathology, where gigapixel images are used. However, prior studies have used either low-resolution thumbnails or random patches, which likely underestimated model performance. Here, we ask whether LMMs can be adapted to reason coherently and accurately in the evaluation of such images. In this study, we introduce Gigapixel Image Agent for Navigating Tissue (GIANT), the first framework that allows LMMs to iteratively navigate whole-slide images (WSIs) like a pathologist. Accompanying GIANT, we release MultiPathQA, a new benchmark, which comprises 934 WSI-level questions, encompassing five clinically-relevant tasks ranging from cancer diagnosis to open-ended reasoning. MultiPathQA also includes 128 questions, authored by two professional pathologists, requiring direct slide interpretation. Using MultiPathQA, we show that our simple agentic system substantially outperforms conventional patch- and thumbnail-based baselines, approaching or surpassing the performance of specialized models trained on millions of images. For example, on pathologist-authored questions, GPT-5 with GIANT achieves 62.5% accuracy, outperforming specialist pathology models such as TITAN (43.8%) and SlideChat (37.5%). Our findings reveal the strengths and limitations of current foundation models and ground future development of LMMs for expert reasoning in pathology.
format Preprint
id arxiv_https___arxiv_org_abs_2511_19652
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Navigating Gigapixel Pathology Images with Large Multimodal Models
Buckley, Thomas A.
Weihrauch, Kian R.
Latham, Katherine
Zhou, Andrew Z.
Manrai, Padmini A.
Manrai, Arjun K.
Computer Vision and Pattern Recognition
Despite being widely used to support clinical care, general-purpose large multimodal models (LMMs) have generally shown poor or inconclusive performance in medical image interpretation, particularly in pathology, where gigapixel images are used. However, prior studies have used either low-resolution thumbnails or random patches, which likely underestimated model performance. Here, we ask whether LMMs can be adapted to reason coherently and accurately in the evaluation of such images. In this study, we introduce Gigapixel Image Agent for Navigating Tissue (GIANT), the first framework that allows LMMs to iteratively navigate whole-slide images (WSIs) like a pathologist. Accompanying GIANT, we release MultiPathQA, a new benchmark, which comprises 934 WSI-level questions, encompassing five clinically-relevant tasks ranging from cancer diagnosis to open-ended reasoning. MultiPathQA also includes 128 questions, authored by two professional pathologists, requiring direct slide interpretation. Using MultiPathQA, we show that our simple agentic system substantially outperforms conventional patch- and thumbnail-based baselines, approaching or surpassing the performance of specialized models trained on millions of images. For example, on pathologist-authored questions, GPT-5 with GIANT achieves 62.5% accuracy, outperforming specialist pathology models such as TITAN (43.8%) and SlideChat (37.5%). Our findings reveal the strengths and limitations of current foundation models and ground future development of LMMs for expert reasoning in pathology.
title Navigating Gigapixel Pathology Images with Large Multimodal Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2511.19652