Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Kangcheng, Jiang, Jun, Zhang, Qing, Zheng, Shuang, Li, Qingli, Xu, Shugong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.14757
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908779516264448
author	Zhou, Kangcheng Jiang, Jun Zhang, Qing Zheng, Shuang Li, Qingli Xu, Shugong
author_facet	Zhou, Kangcheng Jiang, Jun Zhang, Qing Zheng, Shuang Li, Qingli Xu, Shugong
contents	Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_14757
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	ReinPath: A Multimodal Reinforcement Learning Approach for Pathology Zhou, Kangcheng Jiang, Jun Zhang, Qing Zheng, Shuang Li, Qingli Xu, Shugong Computer Vision and Pattern Recognition Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.
title	ReinPath: A Multimodal Reinforcement Learning Approach for Pathology
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.14757

Similar Items