Saved in:
Bibliographic Details
Main Authors: Cao, Elton, Lipson, Hod
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.13549
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • The conversion of 2D freehand sketches into 3D models remains a pivotal challenge in computer vision, bridging the gap between fluent sketching and CAD. Traditional monocular depth reconstruction techniques are not suitable for line drawing interpretation. We propose a generative approach by framing reconstruction as a conditional dense depth estimation task. To achieve this, we implemented a Latent Diffusion Model (LDM) with a conditioning framework to resolve the inherent ambiguities of orthographic projections. We trained our model using a dataset of over one million image-depth pairs. Our framework demonstrated robust performance across varying shape complexities, with 5.3 percent average depth error.