Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yuan, Ruicheng, Zhang, Zhenxuan, Wang, Anbang, Hu, Liwei, Hua, Xiangqian, Peng, Yaya, Luo, Jiawei, Yang, Guang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2603.19957
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918400044826624
author	Yuan, Ruicheng Zhang, Zhenxuan Wang, Anbang Hu, Liwei Hua, Xiangqian Peng, Yaya Luo, Jiawei Yang, Guang
author_facet	Yuan, Ruicheng Zhang, Zhenxuan Wang, Anbang Hu, Liwei Hua, Xiangqian Peng, Yaya Luo, Jiawei Yang, Guang
contents	Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical sites; yet existing pathology vision-language models (VLMs) reduce this output to a flat label or free-form text. We present HiPath, a lightweight VLM framework built on frozen UNI2 and Qwen3 backbones that treats structured report prediction as its primary training objective. Three trainable modules totalling 15M parameters address complementary aspects of the problem: a Hierarchical Patch Aggregator (HiPA) for multi-image visual encoding, Hierarchical Contrastive Learning (HiCL) for cross-modal alignment via optimal transport, and Slot-based Masked Diagnosis Prediction (Slot-MDP) for structured diagnosis generation. Trained on 749K real-world Chinese pathology cases from three hospitals, HiPath achieves 68.9% strict and 74.7% clinically acceptable accuracy with a 97.3% safety rate, outperforming all baselines under the same frozen backbone. Cross-hospital evaluation confirms generalisation with only a 3.4pp drop in strict accuracy while maintaining 97.1% safety.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_19957
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction Yuan, Ruicheng Zhang, Zhenxuan Wang, Anbang Hu, Liwei Hua, Xiangqian Peng, Yaya Luo, Jiawei Yang, Guang Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical sites; yet existing pathology vision-language models (VLMs) reduce this output to a flat label or free-form text. We present HiPath, a lightweight VLM framework built on frozen UNI2 and Qwen3 backbones that treats structured report prediction as its primary training objective. Three trainable modules totalling 15M parameters address complementary aspects of the problem: a Hierarchical Patch Aggregator (HiPA) for multi-image visual encoding, Hierarchical Contrastive Learning (HiCL) for cross-modal alignment via optimal transport, and Slot-based Masked Diagnosis Prediction (Slot-MDP) for structured diagnosis generation. Trained on 749K real-world Chinese pathology cases from three hospitals, HiPath achieves 68.9% strict and 74.7% clinically acceptable accuracy with a 97.3% safety rate, outperforming all baselines under the same frozen backbone. Cross-hospital evaluation confirms generalisation with only a 3.4pp drop in strict accuracy while maintaining 97.1% safety.
title	HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction
topic	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2603.19957

Similar Items