Saved in:
Bibliographic Details
Main Authors: Zhang, Jinchang, Reddy, Praveen Kumar, Wong, Xue-Iuan, Aloimonos, Yiannis, Lu, Guoyu
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.01565
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929478167429120
author Zhang, Jinchang
Reddy, Praveen Kumar
Wong, Xue-Iuan
Aloimonos, Yiannis
Lu, Guoyu
author_facet Zhang, Jinchang
Reddy, Praveen Kumar
Wong, Xue-Iuan
Aloimonos, Yiannis
Lu, Guoyu
contents Depth estimation is a critical topic for robotics and vision-related tasks. In monocular depth estimation, in comparison with supervised learning that requires expensive ground truth labeling, self-supervised methods possess great potential due to no labeling cost. However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance. Meanwhile, scaling is also a major issue for monocular unsupervised depth estimation, which commonly still needs ground truth scale from GPS, LiDAR, or existing maps to correct. In the era of deep learning, existing methods primarily rely on exploring image relationships to train unsupervised neural networks, while the physical properties of the camera itself such as intrinsics and extrinsics are often overlooked. These physical properties are not just mathematical parameters; they are embodiments of the camera's interaction with the physical world. By embedding these physical properties into the deep learning model, we can calculate depth priors for ground regions and regions connected to the ground based on physical principles, providing free supervision signals without the need for additional sensors. This approach is not only easy to implement but also enhances the effects of all unsupervised methods by embedding the camera's physical properties into the model, thereby achieving an embodied understanding of the real world.
format Preprint
id arxiv_https___arxiv_org_abs_2408_01565
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Embodiment: Self-Supervised Depth Estimation Based on Camera Models
Zhang, Jinchang
Reddy, Praveen Kumar
Wong, Xue-Iuan
Aloimonos, Yiannis
Lu, Guoyu
Computer Vision and Pattern Recognition
Depth estimation is a critical topic for robotics and vision-related tasks. In monocular depth estimation, in comparison with supervised learning that requires expensive ground truth labeling, self-supervised methods possess great potential due to no labeling cost. However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance. Meanwhile, scaling is also a major issue for monocular unsupervised depth estimation, which commonly still needs ground truth scale from GPS, LiDAR, or existing maps to correct. In the era of deep learning, existing methods primarily rely on exploring image relationships to train unsupervised neural networks, while the physical properties of the camera itself such as intrinsics and extrinsics are often overlooked. These physical properties are not just mathematical parameters; they are embodiments of the camera's interaction with the physical world. By embedding these physical properties into the deep learning model, we can calculate depth priors for ground regions and regions connected to the ground based on physical principles, providing free supervision signals without the need for additional sensors. This approach is not only easy to implement but also enhances the effects of all unsupervised methods by embedding the camera's physical properties into the model, thereby achieving an embodied understanding of the real world.
title Embodiment: Self-Supervised Depth Estimation Based on Camera Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2408.01565