Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Lintao, Wang, Yinghao, Wang, Chaohui
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.21231
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914172121382912
author	Xu, Lintao Wang, Yinghao Wang, Chaohui
author_facet	Xu, Lintao Wang, Yinghao Wang, Chaohui
contents	Occlusion Boundary Estimation (OBE) identifies boundaries arising from both inter-object occlusions and self-occlusion within individual objects. This task is closely related to Monocular Depth Estimation (MDE), which infers depth from a single image, as Occlusion Boundaries (OBs) provide critical geometric cues for resolving depth ambiguities, while depth can conversely refine occlusion reasoning. In this paper, we aim to systematically model and exploit this mutually beneficial relationship. To this end, we propose MoDOT, a novel framework for joint estimation of depth and OBs, which incorporates a new Cross-Attention Strip Module (CASM) to leverage mid-level OB features for depth prediction, and a novel OB-Depth Constraint Loss (OBDCL) to enforce geometric consistency. To facilitate this study, we contribute OB-Hypersim, a large-scale photorealistic dataset with precise depth and self-occlusion-handled OB annotations. Extensive experiments on two synthetic datasets and NYUD-v2 demonstrate that MoDOT achieves significantly better performance than single-task baselines and multi-task competitors. Furthermore, models trained solely on our synthetic data demonstrate strong generalization to real-world scenes without fine-tuning, producing depth maps with sharper boundaries and improved geometric fidelity. Collectively, these results underscore the significant benefits of jointly modeling OBs and depth. Code and resources are available at https://github.com/xul-ops/MoDOT.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_21231
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning Xu, Lintao Wang, Yinghao Wang, Chaohui Computer Vision and Pattern Recognition Occlusion Boundary Estimation (OBE) identifies boundaries arising from both inter-object occlusions and self-occlusion within individual objects. This task is closely related to Monocular Depth Estimation (MDE), which infers depth from a single image, as Occlusion Boundaries (OBs) provide critical geometric cues for resolving depth ambiguities, while depth can conversely refine occlusion reasoning. In this paper, we aim to systematically model and exploit this mutually beneficial relationship. To this end, we propose MoDOT, a novel framework for joint estimation of depth and OBs, which incorporates a new Cross-Attention Strip Module (CASM) to leverage mid-level OB features for depth prediction, and a novel OB-Depth Constraint Loss (OBDCL) to enforce geometric consistency. To facilitate this study, we contribute OB-Hypersim, a large-scale photorealistic dataset with precise depth and self-occlusion-handled OB annotations. Extensive experiments on two synthetic datasets and NYUD-v2 demonstrate that MoDOT achieves significantly better performance than single-task baselines and multi-task competitors. Furthermore, models trained solely on our synthetic data demonstrate strong generalization to real-world scenes without fine-tuning, producing depth maps with sharper boundaries and improved geometric fidelity. Collectively, these results underscore the significant benefits of jointly modeling OBs and depth. Code and resources are available at https://github.com/xul-ops/MoDOT.
title	Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2505.21231

Similar Items