Saved in:
Bibliographic Details
Main Authors: Zhou, Kangcheng, Jiang, Jun, Zhang, Qing, Zheng, Shuang, Li, Qingli, Xu, Shugong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.14757
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908779516264448
author Zhou, Kangcheng
Jiang, Jun
Zhang, Qing
Zheng, Shuang
Li, Qingli
Xu, Shugong
author_facet Zhou, Kangcheng
Jiang, Jun
Zhang, Qing
Zheng, Shuang
Li, Qingli
Xu, Shugong
contents Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.
format Preprint
id arxiv_https___arxiv_org_abs_2601_14757
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle ReinPath: A Multimodal Reinforcement Learning Approach for Pathology
Zhou, Kangcheng
Jiang, Jun
Zhang, Qing
Zheng, Shuang
Li, Qingli
Xu, Shugong
Computer Vision and Pattern Recognition
Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.
title ReinPath: A Multimodal Reinforcement Learning Approach for Pathology
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2601.14757