Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2020
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2007.02713 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913237270790144 |
|---|---|
| author | Zhai, Yingjie Fan, Deng-Ping Yang, Jufeng Borji, Ali Shao, Ling Han, Junwei Wang, Liang |
| author_facet | Zhai, Yingjie Fan, Deng-Ping Yang, Jufeng Borji, Ali Shao, Ling Han, Junwei Wang, Liang |
| contents | Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent. Extensive experiments show that BBS-Net significantly outperforms eighteen SOTA models on eight challenging datasets under five evaluation measures, demonstrating the superiority of our approach ($\sim 4 \%$ improvement in S-measure $vs.$ the top-ranked model: DMRA-iccv2019). In addition, we provide a comprehensive analysis on the generalization ability of different RGB-D datasets and provide a powerful training set for future research. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2007_02713 |
| institution | arXiv |
| publishDate | 2020 |
| record_format | arxiv |
| spellingShingle | Bifurcated backbone strategy for RGB-D salient object detection Zhai, Yingjie Fan, Deng-Ping Yang, Jufeng Borji, Ali Shao, Ling Han, Junwei Wang, Liang Computer Vision and Pattern Recognition Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent. Extensive experiments show that BBS-Net significantly outperforms eighteen SOTA models on eight challenging datasets under five evaluation measures, demonstrating the superiority of our approach ($\sim 4 \%$ improvement in S-measure $vs.$ the top-ranked model: DMRA-iccv2019). In addition, we provide a comprehensive analysis on the generalization ability of different RGB-D datasets and provide a powerful training set for future research. |
| title | Bifurcated backbone strategy for RGB-D salient object detection |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2007.02713 |