Saved in:
Bibliographic Details
Main Authors: Chen, Yinda, Liu, Che, Liu, Xiaoyu, Arcucci, Rossella, Xiong, Zhiwei
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.15992
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910532881088512
author Chen, Yinda
Liu, Che
Liu, Xiaoyu
Arcucci, Rossella
Xiong, Zhiwei
author_facet Chen, Yinda
Liu, Che
Liu, Xiaoyu
Arcucci, Rossella
Xiong, Zhiwei
contents The burgeoning integration of 3D medical imaging into healthcare has led to a substantial increase in the workload of medical professionals. To assist clinicians in their diagnostic processes and alleviate their workload, the development of a robust system for retrieving similar case studies presents a viable solution. While the concept holds great promise, the field of 3D medical text-image retrieval is currently limited by the absence of robust evaluation benchmarks and curated datasets. To remedy this, our study presents a groundbreaking dataset, {BIMCV-R}, which includes an extensive collection of 8,069 3D CT volumes, encompassing over 2 million slices, paired with their respective radiological reports. Expanding upon the foundational work of our dataset, we craft a retrieval strategy, MedFinder. This approach employs a dual-stream network architecture, harnessing the potential of large language models to advance the field of medical image retrieval beyond existing text-image retrieval solutions. It marks our preliminary step towards developing a system capable of facilitating text-to-image, image-to-text, and keyword-based retrieval tasks. Our project is available at \url{https://huggingface.co/datasets/cyd0806/BIMCV-R}.
format Preprint
id arxiv_https___arxiv_org_abs_2403_15992
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval
Chen, Yinda
Liu, Che
Liu, Xiaoyu
Arcucci, Rossella
Xiong, Zhiwei
Computer Vision and Pattern Recognition
Computation and Language
The burgeoning integration of 3D medical imaging into healthcare has led to a substantial increase in the workload of medical professionals. To assist clinicians in their diagnostic processes and alleviate their workload, the development of a robust system for retrieving similar case studies presents a viable solution. While the concept holds great promise, the field of 3D medical text-image retrieval is currently limited by the absence of robust evaluation benchmarks and curated datasets. To remedy this, our study presents a groundbreaking dataset, {BIMCV-R}, which includes an extensive collection of 8,069 3D CT volumes, encompassing over 2 million slices, paired with their respective radiological reports. Expanding upon the foundational work of our dataset, we craft a retrieval strategy, MedFinder. This approach employs a dual-stream network architecture, harnessing the potential of large language models to advance the field of medical image retrieval beyond existing text-image retrieval solutions. It marks our preliminary step towards developing a system capable of facilitating text-to-image, image-to-text, and keyword-based retrieval tasks. Our project is available at \url{https://huggingface.co/datasets/cyd0806/BIMCV-R}.
title BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval
topic Computer Vision and Pattern Recognition
Computation and Language
url https://arxiv.org/abs/2403.15992