MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Wu, Xiaomin, Xu, Rui, Wei, Pengchen, Qin, Wenkang, Huang, Peixiang, Li, Ziheng, Luo, Lin
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2408.07037
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866909286287802368
author	Wu, Xiaomin Xu, Rui Wei, Pengchen Qin, Wenkang Huang, Peixiang Li, Ziheng Luo, Lin
author_facet	Wu, Xiaomin Xu, Rui Wei, Pengchen Qin, Wenkang Huang, Peixiang Li, Ziheng Luo, Lin
contents	Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex multimodal models, together with a scarcity of high-quality training datasets, create a significant divide between cutting-edge technology and its application in the clinical setting. We had meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks, including the classification of organ tissues, generating pathology report descriptions, and addressing pathology-related questions and answers. We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance. We conducted a qualitative assessment of the capabilities of the base model and the fine-tuned model in performing image captioning and classification tasks on the specific dataset. The evaluation results demonstrate that the fine-tuned model exhibits proficiency in addressing typical pathological questions. We hope that by making both our models and datasets publicly available, they can be valuable to the medical and research communities.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_07037
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology Wu, Xiaomin Xu, Rui Wei, Pengchen Qin, Wenkang Huang, Peixiang Li, Ziheng Luo, Lin Computer Vision and Pattern Recognition Artificial Intelligence Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex multimodal models, together with a scarcity of high-quality training datasets, create a significant divide between cutting-edge technology and its application in the clinical setting. We had meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks, including the classification of organ tissues, generating pathology report descriptions, and addressing pathology-related questions and answers. We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance. We conducted a qualitative assessment of the capabilities of the base model and the fine-tuned model in performing image captioning and classification tasks on the specific dataset. The evaluation results demonstrate that the fine-tuned model exhibits proficiency in addressing typical pathological questions. We hope that by making both our models and datasets publicly available, they can be valuable to the medical and research communities.
title	PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2408.07037

Documenti analoghi