Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Jinchang, Reddy, Praveen Kumar, Wong, Xue-Iuan, Aloimonos, Yiannis, Lu, Guoyu
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.01565
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929478167429120
author	Zhang, Jinchang Reddy, Praveen Kumar Wong, Xue-Iuan Aloimonos, Yiannis Lu, Guoyu
author_facet	Zhang, Jinchang Reddy, Praveen Kumar Wong, Xue-Iuan Aloimonos, Yiannis Lu, Guoyu
contents	Depth estimation is a critical topic for robotics and vision-related tasks. In monocular depth estimation, in comparison with supervised learning that requires expensive ground truth labeling, self-supervised methods possess great potential due to no labeling cost. However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance. Meanwhile, scaling is also a major issue for monocular unsupervised depth estimation, which commonly still needs ground truth scale from GPS, LiDAR, or existing maps to correct. In the era of deep learning, existing methods primarily rely on exploring image relationships to train unsupervised neural networks, while the physical properties of the camera itself such as intrinsics and extrinsics are often overlooked. These physical properties are not just mathematical parameters; they are embodiments of the camera's interaction with the physical world. By embedding these physical properties into the deep learning model, we can calculate depth priors for ground regions and regions connected to the ground based on physical principles, providing free supervision signals without the need for additional sensors. This approach is not only easy to implement but also enhances the effects of all unsupervised methods by embedding the camera's physical properties into the model, thereby achieving an embodied understanding of the real world.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_01565
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Embodiment: Self-Supervised Depth Estimation Based on Camera Models Zhang, Jinchang Reddy, Praveen Kumar Wong, Xue-Iuan Aloimonos, Yiannis Lu, Guoyu Computer Vision and Pattern Recognition Depth estimation is a critical topic for robotics and vision-related tasks. In monocular depth estimation, in comparison with supervised learning that requires expensive ground truth labeling, self-supervised methods possess great potential due to no labeling cost. However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance. Meanwhile, scaling is also a major issue for monocular unsupervised depth estimation, which commonly still needs ground truth scale from GPS, LiDAR, or existing maps to correct. In the era of deep learning, existing methods primarily rely on exploring image relationships to train unsupervised neural networks, while the physical properties of the camera itself such as intrinsics and extrinsics are often overlooked. These physical properties are not just mathematical parameters; they are embodiments of the camera's interaction with the physical world. By embedding these physical properties into the deep learning model, we can calculate depth priors for ground regions and regions connected to the ground based on physical principles, providing free supervision signals without the need for additional sensors. This approach is not only easy to implement but also enhances the effects of all unsupervised methods by embedding the camera's physical properties into the model, thereby achieving an embodied understanding of the real world.
title	Embodiment: Self-Supervised Depth Estimation Based on Camera Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2408.01565

Similar Items