Saved in:
Bibliographic Details
Main Authors: Ghelichkhan, Elham, Tasdizen, Tolga
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.01037
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910854296895488
author Ghelichkhan, Elham
Tasdizen, Tolga
author_facet Ghelichkhan, Elham
Tasdizen, Tolga
contents Chest diseases rank among the most prevalent and dangerous global health issues. Object detection and phrase grounding deep learning models interpret complex radiology data to assist healthcare professionals in diagnosis. Object detection locates abnormalities for classes, while phrase grounding locates abnormalities for textual descriptions. This paper investigates how text enhances abnormality localization in chest X-rays by comparing the performance and explainability of these two tasks. To establish an explainability baseline, we proposed an automatic pipeline to generate image regions for report sentences using radiologists' eye-tracking data. The better performance - mIoU = 0.36 vs. 0.20 - and explainability - Containment ratio 0.48 vs. 0.26 - of the phrase grounding model infers the effectiveness of text in enhancing chest X-ray abnormality localization.
format Preprint
id arxiv_https___arxiv_org_abs_2503_01037
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data
Ghelichkhan, Elham
Tasdizen, Tolga
Computer Vision and Pattern Recognition
Machine Learning
Chest diseases rank among the most prevalent and dangerous global health issues. Object detection and phrase grounding deep learning models interpret complex radiology data to assist healthcare professionals in diagnosis. Object detection locates abnormalities for classes, while phrase grounding locates abnormalities for textual descriptions. This paper investigates how text enhances abnormality localization in chest X-rays by comparing the performance and explainability of these two tasks. To establish an explainability baseline, we proposed an automatic pipeline to generate image regions for report sentences using radiologists' eye-tracking data. The better performance - mIoU = 0.36 vs. 0.20 - and explainability - Containment ratio 0.48 vs. 0.26 - of the phrase grounding model infers the effectiveness of text in enhancing chest X-ray abnormality localization.
title A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2503.01037