Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Hao, Lu, Daiwei, d'Almeida, Jesse, Isik, Dilara, Aghdam, Ehsan Khodapanah, DiSanto, Nick, Acar, Ayberk, Sharma, Susheela, Wu, Jie Ying, Webster III, Robert J., Oguz, Ipek
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.02247
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915595302207488
author	Li, Hao Lu, Daiwei d'Almeida, Jesse Isik, Dilara Aghdam, Ehsan Khodapanah DiSanto, Nick Acar, Ayberk Sharma, Susheela Wu, Jie Ying Webster III, Robert J. Oguz, Ipek
author_facet	Li, Hao Lu, Daiwei d'Almeida, Jesse Isik, Dilara Aghdam, Ehsan Khodapanah DiSanto, Nick Acar, Ayberk Sharma, Susheela Wu, Jie Ying Webster III, Robert J. Oguz, Ipek
contents	Monocular depth estimation (MDE) is a critical task to guide autonomous medical robots. However, obtaining absolute (metric) depth from an endoscopy camera in surgical scenes is difficult, which limits supervised learning of depth on real endoscopic images. Current image-level unsupervised domain adaptation methods translate synthetic images with known depth maps into the style of real endoscopic frames and train depth networks using these translated images with their corresponding depth maps. However a domain gap often remains between real and translated synthetic images. In this paper, we present a latent feature alignment method to improve absolute depth estimation by reducing this domain gap in the context of endoscopic videos of the central airway. Our methods are agnostic to the image translation process and focus on the depth estimation itself. Specifically, the depth network takes translated synthetic and real endoscopic frames as input and learns latent domain-invariant features via adversarial learning and directional feature consistency. The evaluation is conducted on endoscopic videos of central airway phantoms with manually aligned absolute depth maps. Compared to state-of-the-art MDE methods, our approach achieves superior performance on both absolute and relative depth metrics, and consistently improves results across various backbones and pretrained weights. Our code is available at https://github.com/MedICL-VU/MDE.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_02247
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Monocular absolute depth estimation from endoscopy via domain-invariant feature learning and latent consistency Li, Hao Lu, Daiwei d'Almeida, Jesse Isik, Dilara Aghdam, Ehsan Khodapanah DiSanto, Nick Acar, Ayberk Sharma, Susheela Wu, Jie Ying Webster III, Robert J. Oguz, Ipek Computer Vision and Pattern Recognition Monocular depth estimation (MDE) is a critical task to guide autonomous medical robots. However, obtaining absolute (metric) depth from an endoscopy camera in surgical scenes is difficult, which limits supervised learning of depth on real endoscopic images. Current image-level unsupervised domain adaptation methods translate synthetic images with known depth maps into the style of real endoscopic frames and train depth networks using these translated images with their corresponding depth maps. However a domain gap often remains between real and translated synthetic images. In this paper, we present a latent feature alignment method to improve absolute depth estimation by reducing this domain gap in the context of endoscopic videos of the central airway. Our methods are agnostic to the image translation process and focus on the depth estimation itself. Specifically, the depth network takes translated synthetic and real endoscopic frames as input and learns latent domain-invariant features via adversarial learning and directional feature consistency. The evaluation is conducted on endoscopic videos of central airway phantoms with manually aligned absolute depth maps. Compared to state-of-the-art MDE methods, our approach achieves superior performance on both absolute and relative depth metrics, and consistently improves results across various backbones and pretrained weights. Our code is available at https://github.com/MedICL-VU/MDE.
title	Monocular absolute depth estimation from endoscopy via domain-invariant feature learning and latent consistency
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2511.02247

Similar Items