Saved in:
Bibliographic Details
Main Authors: Boya, Wang, Shuo, Wang, Dong, Ye, Ziwen, Dou
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2309.09272
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929201760698368
author Boya, Wang
Shuo, Wang
Dong, Ye
Ziwen, Dou
author_facet Boya, Wang
Shuo, Wang
Dong, Ye
Ziwen, Dou
contents With the frequent use of self-supervised monocular depth estimation in robotics and autonomous driving, the model's efficiency is becoming increasingly important. Most current approaches apply much larger and more complex networks to improve the precision of depth estimation. Some researchers incorporated Transformer into self-supervised monocular depth estimation to achieve better performance. However, this method leads to high parameters and high computation. We present a fully convolutional depth estimation network using contextual feature fusion. Compared to UNet++ and HRNet, we use high-resolution and low-resolution features to reserve information on small targets and fast-moving objects instead of long-range fusion. We further promote depth estimation results employing lightweight channel attention based on convolution in the decoder stage. Our method reduces the parameters without sacrificing accuracy. Experiments on the KITTI benchmark show that our method can get better results than many large models, such as Monodepth2, with only 30 parameters. The source code is available at https://github.com/boyagesmile/DNA-Depth.
format Preprint
id arxiv_https___arxiv_org_abs_2309_09272
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Deep Neighbor Layer Aggregation for Lightweight Self-Supervised Monocular Depth Estimation
Boya, Wang
Shuo, Wang
Dong, Ye
Ziwen, Dou
Computer Vision and Pattern Recognition
Artificial Intelligence
With the frequent use of self-supervised monocular depth estimation in robotics and autonomous driving, the model's efficiency is becoming increasingly important. Most current approaches apply much larger and more complex networks to improve the precision of depth estimation. Some researchers incorporated Transformer into self-supervised monocular depth estimation to achieve better performance. However, this method leads to high parameters and high computation. We present a fully convolutional depth estimation network using contextual feature fusion. Compared to UNet++ and HRNet, we use high-resolution and low-resolution features to reserve information on small targets and fast-moving objects instead of long-range fusion. We further promote depth estimation results employing lightweight channel attention based on convolution in the decoder stage. Our method reduces the parameters without sacrificing accuracy. Experiments on the KITTI benchmark show that our method can get better results than many large models, such as Monodepth2, with only 30 parameters. The source code is available at https://github.com/boyagesmile/DNA-Depth.
title Deep Neighbor Layer Aggregation for Lightweight Self-Supervised Monocular Depth Estimation
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2309.09272