Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shu, Hongchao, Soberanis-Mukul, Roger D., Xu, Jiru, Ding, Hao, Ringel, Morgan, Shen, Mali, Sayed, Saif Iftekar, Rafii-Tari, Hedyeh, Unberath, Mathias
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.09443
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915613048307712
author	Shu, Hongchao Soberanis-Mukul, Roger D. Xu, Jiru Ding, Hao Ringel, Morgan Shen, Mali Sayed, Saif Iftekar Rafii-Tari, Hedyeh Unberath, Mathias
author_facet	Shu, Hongchao Soberanis-Mukul, Roger D. Xu, Jiru Ding, Hao Ringel, Morgan Shen, Mali Sayed, Saif Iftekar Rafii-Tari, Hedyeh Unberath, Mathias
contents	Accurate intra-operative localization of the bronchoscope tip relative to patient anatomy remains challenging due to respiratory motion, anatomical variability, and CT-to-body divergence that cause deformation and misalignment between intra-operative views and pre-operative CT. Existing vision-based methods often fail to generalize across domains and patients, leading to residual alignment errors. This work establishes a generalizable foundation for bronchoscopy navigation through a robust vision-based framework and a new synthetic benchmark dataset that enables standardized and reproducible evaluation. We propose a vision-based pose optimization framework for frame-wise 2D-3D registration between intra-operative endoscopic views and pre-operative CT anatomy. A fine-tuned modality- and domain-invariant encoder enables direct similarity computation between real endoscopic RGB frames and CT-rendered depth maps, while a differentiable rendering module iteratively refines camera poses through depth consistency. To enhance reproducibility, we introduce the first public synthetic benchmark dataset for bronchoscopy navigation, addressing the lack of paired CT-endoscopy data. Trained exclusively on synthetic data distinct from the benchmark, our model achieves an average translational error of 2.65 mm and a rotational error of 0.19 rad, demonstrating accurate and stable localization. Qualitative results on real patient data further confirm strong cross-domain generalization, achieving consistent frame-wise 2D-3D alignment without domain-specific adaptation. Overall, the proposed framework achieves robust, domain-invariant localization through iterative vision-based optimization, while the new benchmark provides a foundation for standardized progress in vision-based bronchoscopy navigation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_09443
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation Shu, Hongchao Soberanis-Mukul, Roger D. Xu, Jiru Ding, Hao Ringel, Morgan Shen, Mali Sayed, Saif Iftekar Rafii-Tari, Hedyeh Unberath, Mathias Computer Vision and Pattern Recognition Artificial Intelligence Accurate intra-operative localization of the bronchoscope tip relative to patient anatomy remains challenging due to respiratory motion, anatomical variability, and CT-to-body divergence that cause deformation and misalignment between intra-operative views and pre-operative CT. Existing vision-based methods often fail to generalize across domains and patients, leading to residual alignment errors. This work establishes a generalizable foundation for bronchoscopy navigation through a robust vision-based framework and a new synthetic benchmark dataset that enables standardized and reproducible evaluation. We propose a vision-based pose optimization framework for frame-wise 2D-3D registration between intra-operative endoscopic views and pre-operative CT anatomy. A fine-tuned modality- and domain-invariant encoder enables direct similarity computation between real endoscopic RGB frames and CT-rendered depth maps, while a differentiable rendering module iteratively refines camera poses through depth consistency. To enhance reproducibility, we introduce the first public synthetic benchmark dataset for bronchoscopy navigation, addressing the lack of paired CT-endoscopy data. Trained exclusively on synthetic data distinct from the benchmark, our model achieves an average translational error of 2.65 mm and a rotational error of 0.19 rad, demonstrating accurate and stable localization. Qualitative results on real patient data further confirm strong cross-domain generalization, achieving consistent frame-wise 2D-3D alignment without domain-specific adaptation. Overall, the proposed framework achieves robust, domain-invariant localization through iterative vision-based optimization, while the new benchmark provides a foundation for standardized progress in vision-based bronchoscopy navigation.
title	BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2511.09443

Similar Items