Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Luo, Yuanjiang, Li, Hongxiang, Wu, Xuan, Cao, Meng, Huang, Xiaoshuang, Zhu, Zhihong, Liao, Peixi, Chen, Hu, Zhang, Yi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2405.20607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929375934414848
author	Luo, Yuanjiang Li, Hongxiang Wu, Xuan Cao, Meng Huang, Xiaoshuang Zhu, Zhihong Liao, Peixi Chen, Hu Zhang, Yi
author_facet	Luo, Yuanjiang Li, Hongxiang Wu, Xuan Cao, Meng Huang, Xiaoshuang Zhu, Zhihong Liao, Peixi Chen, Hu Zhang, Yi
contents	Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specifically, textual inversion can project text and image into the same space by representing images as pseudo words to eliminate the cross-modeling gap. Subsequently, self-supervised refinement refines these pseudo words through contrastive loss computation between images and texts, enhancing the fidelity of generated reports to images. Notably, TISR is orthogonal to most existing methods, plug-and-play. We conduct experiments on two widely-used public datasets and achieve significant improvements on various baselines, which demonstrates the effectiveness and generalization of TISR. The code will be available soon.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_20607
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Textual Inversion and Self-supervised Refinement for Radiology Report Generation Luo, Yuanjiang Li, Hongxiang Wu, Xuan Cao, Meng Huang, Xiaoshuang Zhu, Zhihong Liao, Peixi Chen, Hu Zhang, Yi Computer Vision and Pattern Recognition Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specifically, textual inversion can project text and image into the same space by representing images as pseudo words to eliminate the cross-modeling gap. Subsequently, self-supervised refinement refines these pseudo words through contrastive loss computation between images and texts, enhancing the fidelity of generated reports to images. Notably, TISR is orthogonal to most existing methods, plug-and-play. We conduct experiments on two widely-used public datasets and achieve significant improvements on various baselines, which demonstrates the effectiveness and generalization of TISR. The code will be available soon.
title	Textual Inversion and Self-supervised Refinement for Radiology Report Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2405.20607

Similar Items