Saved in:
Bibliographic Details
Main Authors: Chen, Hao, Zhao, Wei, Li, Yingli, Zhong, Tianyang, Wang, Yisong, Shang, Youlan, Guo, Lei, Han, Junwei, Liu, Tianming, Liu, Jun, Zhang, Tuo
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.19330
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912049646272512
author Chen, Hao
Zhao, Wei
Li, Yingli
Zhong, Tianyang
Wang, Yisong
Shang, Youlan
Guo, Lei
Han, Junwei
Liu, Tianming
Liu, Jun
Zhang, Tuo
author_facet Chen, Hao
Zhao, Wei
Li, Yingli
Zhong, Tianyang
Wang, Yisong
Shang, Youlan
Guo, Lei
Han, Junwei
Liu, Tianming
Liu, Jun
Zhang, Tuo
contents Medical image analysis is crucial in modern radiological diagnostics, especially given the exponential growth in medical imaging data. The demand for automated report generation systems has become increasingly urgent. While prior research has mainly focused on using machine learning and multimodal language models for 2D medical images, the generation of reports for 3D medical images has been less explored due to data scarcity and computational complexities. This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model specifically designed for generating radiology reports from 3D CT scans, particularly chest CTs. Extensive experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality. Although current methods are few, including the partially open-source CT2Rep and the open-source M3D, we ensured fair comparison through appropriate data conversion and evaluation methodologies. Experimental results indicate that 3D-CT-GPT enhances diagnostic accuracy and report coherence, establishing itself as a robust solution for clinical radiology report generation. Future work will focus on expanding the dataset and further optimizing the model to enhance its performance and applicability.
format Preprint
id arxiv_https___arxiv_org_abs_2409_19330
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models
Chen, Hao
Zhao, Wei
Li, Yingli
Zhong, Tianyang
Wang, Yisong
Shang, Youlan
Guo, Lei
Han, Junwei
Liu, Tianming
Liu, Jun
Zhang, Tuo
Computer Vision and Pattern Recognition
Artificial Intelligence
Medical image analysis is crucial in modern radiological diagnostics, especially given the exponential growth in medical imaging data. The demand for automated report generation systems has become increasingly urgent. While prior research has mainly focused on using machine learning and multimodal language models for 2D medical images, the generation of reports for 3D medical images has been less explored due to data scarcity and computational complexities. This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model specifically designed for generating radiology reports from 3D CT scans, particularly chest CTs. Extensive experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality. Although current methods are few, including the partially open-source CT2Rep and the open-source M3D, we ensured fair comparison through appropriate data conversion and evaluation methodologies. Experimental results indicate that 3D-CT-GPT enhances diagnostic accuracy and report coherence, establishing itself as a robust solution for clinical radiology report generation. Future work will focus on expanding the dataset and further optimizing the model to enhance its performance and applicability.
title 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2409.19330