Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guo, Xianda, Yuan, Wenjie, Zhang, Yunpeng, Yang, Tian, Zhang, Chenming, Zhu, Zheng, Zou, Qin, Chen, Long
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2303.07759
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908645576409088
author	Guo, Xianda Yuan, Wenjie Zhang, Yunpeng Yang, Tian Zhang, Chenming Zhu, Zheng Zou, Qin Chen, Long
author_facet	Guo, Xianda Yuan, Wenjie Zhang, Yunpeng Yang, Tian Zhang, Chenming Zhu, Zheng Zou, Qin Chen, Long
contents	Depth estimation has been widely studied and serves as the fundamental step of 3D perception for robotics and autonomous driving. Though significant progress has been made in monocular depth estimation in the past decades, these attempts are mainly conducted on the KITTI benchmark with only front-view cameras, which ignores the correlations across surround-view cameras. In this paper, we propose an Adjacent-View Transformer for Supervised Surround-view Depth estimation (AVT-SSDepth), to jointly predict the depth maps across multiple surrounding cameras. Specifically, we employ a global-to-local feature extraction module that combines CNN with transformer layers for enriched representations. Further, the adjacent-view attention mechanism is proposed to enable the intra-view and inter-view feature propagation. The former is achieved by the self-attention module within each view, while the latter is realized by the adjacent attention module, which computes the attention across multi-cameras to exchange the multi-scale representations across surroundview feature maps. In addition, AVT-SSDepth has strong crossdataset generalization. Extensive experiments show that our method achieves superior performance over existing state-ofthe-art methods on both DDAD and nuScenes datasets. Code is available at https://github.com/XiandaGuo/SSDepth.
format	Preprint
id	arxiv_https___arxiv_org_abs_2303_07759
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Adjacent-view Transformers for Supervised Surround-view Depth Estimation Guo, Xianda Yuan, Wenjie Zhang, Yunpeng Yang, Tian Zhang, Chenming Zhu, Zheng Zou, Qin Chen, Long Computer Vision and Pattern Recognition Depth estimation has been widely studied and serves as the fundamental step of 3D perception for robotics and autonomous driving. Though significant progress has been made in monocular depth estimation in the past decades, these attempts are mainly conducted on the KITTI benchmark with only front-view cameras, which ignores the correlations across surround-view cameras. In this paper, we propose an Adjacent-View Transformer for Supervised Surround-view Depth estimation (AVT-SSDepth), to jointly predict the depth maps across multiple surrounding cameras. Specifically, we employ a global-to-local feature extraction module that combines CNN with transformer layers for enriched representations. Further, the adjacent-view attention mechanism is proposed to enable the intra-view and inter-view feature propagation. The former is achieved by the self-attention module within each view, while the latter is realized by the adjacent attention module, which computes the attention across multi-cameras to exchange the multi-scale representations across surroundview feature maps. In addition, AVT-SSDepth has strong crossdataset generalization. Extensive experiments show that our method achieves superior performance over existing state-ofthe-art methods on both DDAD and nuScenes datasets. Code is available at https://github.com/XiandaGuo/SSDepth.
title	Adjacent-view Transformers for Supervised Surround-view Depth Estimation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2303.07759

Similar Items