Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Shengxuming, Li, Weihan, Gao, Tianhong, Hu, Jiacong, Luo, Haoming, Zhang, Xiuming, Zhang, Jing, Song, Mingli, Feng, Zunlei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2412.09521
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915289349750784
author	Zhang, Shengxuming Li, Weihan Gao, Tianhong Hu, Jiacong Luo, Haoming Zhang, Xiuming Zhang, Jing Song, Mingli Feng, Zunlei
author_facet	Zhang, Shengxuming Li, Weihan Gao, Tianhong Hu, Jiacong Luo, Haoming Zhang, Xiuming Zhang, Jing Song, Mingli Feng, Zunlei
contents	Pathological diagnosis is vital for determining disease characteristics, guiding treatment, and assessing prognosis, relying heavily on detailed, multi-scale analysis of high-resolution whole slide images (WSI). However, existing large vision-language models (LVLMs) are limited by input resolution constraints, hindering their efficiency and accuracy in pathology image analysis. To overcome these issues, we propose two innovative strategies: the mixed task-guided feature enhancement, which directs feature extraction toward lesion-related details across scales, and the prompt-guided detail feature completion, which integrates coarse- and fine-grained features from WSI based on specific prompts without compromising inference speed. Leveraging a comprehensive dataset of 490K samples from diverse pathology tasks, we trained the pathology-specialized LVLM, OmniPath. Extensive experiments demonstrate that this model significantly outperforms existing methods in diagnostic accuracy and efficiency, providing an interactive, clinically aligned approach for auxiliary diagnosis in a wide range of pathology applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_09521
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis Zhang, Shengxuming Li, Weihan Gao, Tianhong Hu, Jiacong Luo, Haoming Zhang, Xiuming Zhang, Jing Song, Mingli Feng, Zunlei Computer Vision and Pattern Recognition Artificial Intelligence Pathological diagnosis is vital for determining disease characteristics, guiding treatment, and assessing prognosis, relying heavily on detailed, multi-scale analysis of high-resolution whole slide images (WSI). However, existing large vision-language models (LVLMs) are limited by input resolution constraints, hindering their efficiency and accuracy in pathology image analysis. To overcome these issues, we propose two innovative strategies: the mixed task-guided feature enhancement, which directs feature extraction toward lesion-related details across scales, and the prompt-guided detail feature completion, which integrates coarse- and fine-grained features from WSI based on specific prompts without compromising inference speed. Leveraging a comprehensive dataset of 490K samples from diverse pathology tasks, we trained the pathology-specialized LVLM, OmniPath. Extensive experiments demonstrate that this model significantly outperforms existing methods in diagnostic accuracy and efficiency, providing an interactive, clinically aligned approach for auxiliary diagnosis in a wide range of pathology applications.
title	Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2412.09521

Similar Items