Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhihao, Kumar, Abhinav, Ganesan, Girish Chandar, Liu, Xiaoming
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.04594
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910080235995136
author	Zhang, Zhihao Kumar, Abhinav Ganesan, Girish Chandar Liu, Xiaoming
author_facet	Zhang, Zhihao Kumar, Abhinav Ganesan, Girish Chandar Liu, Xiaoming
contents	Monocular 3D detection (Mono3D) aims to infer 3D bounding boxes from a single RGB image. Without auxiliary sensors such as LiDAR, this task is inherently ill-posed since the 3D-to-2D projection introduces depth ambiguity. Previous works often predict 3D attributes (e.g., depth, size, and orientation) in parallel, overlooking that these attributes are inherently correlated through the 3D-to-2D projection. However, simply enforcing such correlations through sequential prediction can propagate errors across attributes, especially when objects are occluded or truncated, where inaccurate size or orientation predictions can further amplify depth errors. Therefore, neither parallel nor sequential prediction is optimal. In this paper, we propose MonoCoP, an adaptive framework that learns when and how to leverage inter-attribute correlations with two complementary designs. A Chain-of-Prediction (CoP) explores inter-attribute correlations through feature-level learning, propagation, and aggregation, while an Uncertainty-Guided Selector (GS) dynamically switches between CoP and parallel paradigms for each object based on the predicted uncertainty. By combining their strengths, MonoCoP achieves state-of-the-art (SOTA) performance on KITTI, nuScenes, and Waymo, significantly improving depth accuracy, particularly for distant objects.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_04594
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Unleashing the Power of Chain-of-Prediction for Monocular 3D Object Detection Zhang, Zhihao Kumar, Abhinav Ganesan, Girish Chandar Liu, Xiaoming Computer Vision and Pattern Recognition Monocular 3D detection (Mono3D) aims to infer 3D bounding boxes from a single RGB image. Without auxiliary sensors such as LiDAR, this task is inherently ill-posed since the 3D-to-2D projection introduces depth ambiguity. Previous works often predict 3D attributes (e.g., depth, size, and orientation) in parallel, overlooking that these attributes are inherently correlated through the 3D-to-2D projection. However, simply enforcing such correlations through sequential prediction can propagate errors across attributes, especially when objects are occluded or truncated, where inaccurate size or orientation predictions can further amplify depth errors. Therefore, neither parallel nor sequential prediction is optimal. In this paper, we propose MonoCoP, an adaptive framework that learns when and how to leverage inter-attribute correlations with two complementary designs. A Chain-of-Prediction (CoP) explores inter-attribute correlations through feature-level learning, propagation, and aggregation, while an Uncertainty-Guided Selector (GS) dynamically switches between CoP and parallel paradigms for each object based on the predicted uncertainty. By combining their strengths, MonoCoP achieves state-of-the-art (SOTA) performance on KITTI, nuScenes, and Waymo, significantly improving depth accuracy, particularly for distant objects.
title	Unleashing the Power of Chain-of-Prediction for Monocular 3D Object Detection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2505.04594

Similar Items