Saved in:
Bibliographic Details
Main Authors: Zhou, Keyu, Xu, Peisen, Wu, Yahao, Chen, Jiming, Li, Gaofeng, Li, Shunlei
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.20500
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908849813848064
author Zhou, Keyu
Xu, Peisen
Wu, Yahao
Chen, Jiming
Li, Gaofeng
Li, Shunlei
author_facet Zhou, Keyu
Xu, Peisen
Wu, Yahao
Chen, Jiming
Li, Gaofeng
Li, Shunlei
contents Autonomous laparoscopic camera control must maintain a stable and safe surgical view under rapid tool-tissue interactions while remaining interpretable to surgeons. We present a strategy-grounded framework that couples high-level vision-language inference with low-level closed-loop control. Offline, raw surgical videos are parsed into camera-relevant temporal events (e.g., interaction, working-distance deviation, and view-quality degradation) and structured as attributed event graphs. Mining these graphs yields a compact set of reusable camera-handling strategy primitives, which provide structured supervision for learning. Online, a fine-tuned Vision-Language Model (VLM) processes the live laparoscopic view to predict the dominant strategy and discrete image-based motion commands, executed by an IBVS-RCM controller under strict safety constraints; optional speech input enables intuitive human-in-the-loop conditioning. On a surgeon-annotated dataset, event parsing achieves reliable temporal localization (F1-score 0.86), and the mined strategies show strong semantic alignment with expert interpretation (cluster purity 0.81). Extensive ex vivo experiments on silicone phantoms and porcine tissues demonstrate that the proposed system outperforms junior surgeons in standardized camera-handling evaluations, reducing field-of-view centering error by 35.26% and image shaking by 62.33%, while preserving smooth motion and stable working-distance regulation.
format Preprint
id arxiv_https___arxiv_org_abs_2602_20500
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining
Zhou, Keyu
Xu, Peisen
Wu, Yahao
Chen, Jiming
Li, Gaofeng
Li, Shunlei
Robotics
Computer Vision and Pattern Recognition
Autonomous laparoscopic camera control must maintain a stable and safe surgical view under rapid tool-tissue interactions while remaining interpretable to surgeons. We present a strategy-grounded framework that couples high-level vision-language inference with low-level closed-loop control. Offline, raw surgical videos are parsed into camera-relevant temporal events (e.g., interaction, working-distance deviation, and view-quality degradation) and structured as attributed event graphs. Mining these graphs yields a compact set of reusable camera-handling strategy primitives, which provide structured supervision for learning. Online, a fine-tuned Vision-Language Model (VLM) processes the live laparoscopic view to predict the dominant strategy and discrete image-based motion commands, executed by an IBVS-RCM controller under strict safety constraints; optional speech input enables intuitive human-in-the-loop conditioning. On a surgeon-annotated dataset, event parsing achieves reliable temporal localization (F1-score 0.86), and the mined strategies show strong semantic alignment with expert interpretation (cluster purity 0.81). Extensive ex vivo experiments on silicone phantoms and porcine tissues demonstrate that the proposed system outperforms junior surgeons in standardized camera-handling evaluations, reducing field-of-view centering error by 35.26% and image shaking by 62.33%, while preserving smooth motion and stable working-distance regulation.
title Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining
topic Robotics
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.20500