Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lin, Wenjun, Hu, Yan, Fu, Huazhu, Yang, Mingming, Chng, Chin-Boon, Kawasaki, Ryo, Chui, Cheekong, Liu, Jiang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.00322
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913292685934592
author	Lin, Wenjun Hu, Yan Fu, Huazhu Yang, Mingming Chng, Chin-Boon Kawasaki, Ryo Chui, Cheekong Liu, Jiang
author_facet	Lin, Wenjun Hu, Yan Fu, Huazhu Yang, Mingming Chng, Chin-Boon Kawasaki, Ryo Chui, Cheekong Liu, Jiang
contents	Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as <instrument class, instrument bounding box, tissue class, tissue bounding box, action class> quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_00322
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Instrument-tissue Interaction Detection Framework for Surgical Video Understanding Lin, Wenjun Hu, Yan Fu, Huazhu Yang, Mingming Chng, Chin-Boon Kawasaki, Ryo Chui, Cheekong Liu, Jiang Computer Vision and Pattern Recognition Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as <instrument class, instrument bounding box, tissue class, tissue bounding box, action class> quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.
title	Instrument-tissue Interaction Detection Framework for Surgical Video Understanding
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.00322

Similar Items