Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Halici, Ahmet, Cebeci, Ece Tugba, Balci, Musa, Cini, Mustafa, Sokmen, Serkan
Format:	Preprint
Published:	2026
Subjects:	Image and Video Processing Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.16422
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912911556870144
author	Halici, Ahmet Cebeci, Ece Tugba Balci, Musa Cini, Mustafa Sokmen, Serkan
author_facet	Halici, Ahmet Cebeci, Ece Tugba Balci, Musa Cini, Mustafa Sokmen, Serkan
contents	Generating diagnostic text from histopathology whole slide images (WSIs) is challenging due to the gigapixel scale of the input and the requirement for precise, domain specific language. We propose a hierarchical vision language framework that combines a frozen pathology foundation model with a Transformer decoder for report generation. To make WSI processing tractable, we perform multi resolution pyramidal patch selection (downsampling factors 2^3 to 2^6) and remove background and artifacts using Laplacian variance and HSV based criteria. Patch features are extracted with the UNI Vision Transformer and projected to a 6 layer Transformer decoder that generates diagnostic text via cross attention. To better represent biomedical terminology, we tokenize the output using BioGPT. Finally, we add a retrieval based verification step that compares generated reports with a reference corpus using Sentence BERT embeddings; if a high similarity match is found, the generated report is replaced with the retrieved ground truth reference to improve reliability.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_16422
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Automated Histopathology Report Generation via Pyramidal Feature Extraction and the UNI Foundation Model Halici, Ahmet Cebeci, Ece Tugba Balci, Musa Cini, Mustafa Sokmen, Serkan Image and Video Processing Artificial Intelligence Computer Vision and Pattern Recognition Generating diagnostic text from histopathology whole slide images (WSIs) is challenging due to the gigapixel scale of the input and the requirement for precise, domain specific language. We propose a hierarchical vision language framework that combines a frozen pathology foundation model with a Transformer decoder for report generation. To make WSI processing tractable, we perform multi resolution pyramidal patch selection (downsampling factors 2^3 to 2^6) and remove background and artifacts using Laplacian variance and HSV based criteria. Patch features are extracted with the UNI Vision Transformer and projected to a 6 layer Transformer decoder that generates diagnostic text via cross attention. To better represent biomedical terminology, we tokenize the output using BioGPT. Finally, we add a retrieval based verification step that compares generated reports with a reference corpus using Sentence BERT embeddings; if a high similarity match is found, the generated report is replaced with the retrieved ground truth reference to improve reliability.
title	Automated Histopathology Report Generation via Pyramidal Feature Extraction and the UNI Foundation Model
topic	Image and Video Processing Artificial Intelligence Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.16422

Similar Items