Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Keyu, Xu, Peisen, Wu, Yahao, Chen, Jiming, Li, Gaofeng, Li, Shunlei
Format:	Preprint
Published:	2026
Subjects:	Robotics Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.20500
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908849813848064
author	Zhou, Keyu Xu, Peisen Wu, Yahao Chen, Jiming Li, Gaofeng Li, Shunlei
author_facet	Zhou, Keyu Xu, Peisen Wu, Yahao Chen, Jiming Li, Gaofeng Li, Shunlei
contents	Autonomous laparoscopic camera control must maintain a stable and safe surgical view under rapid tool-tissue interactions while remaining interpretable to surgeons. We present a strategy-grounded framework that couples high-level vision-language inference with low-level closed-loop control. Offline, raw surgical videos are parsed into camera-relevant temporal events (e.g., interaction, working-distance deviation, and view-quality degradation) and structured as attributed event graphs. Mining these graphs yields a compact set of reusable camera-handling strategy primitives, which provide structured supervision for learning. Online, a fine-tuned Vision-Language Model (VLM) processes the live laparoscopic view to predict the dominant strategy and discrete image-based motion commands, executed by an IBVS-RCM controller under strict safety constraints; optional speech input enables intuitive human-in-the-loop conditioning. On a surgeon-annotated dataset, event parsing achieves reliable temporal localization (F1-score 0.86), and the mined strategies show strong semantic alignment with expert interpretation (cluster purity 0.81). Extensive ex vivo experiments on silicone phantoms and porcine tissues demonstrate that the proposed system outperforms junior surgeons in standardized camera-handling evaluations, reducing field-of-view centering error by 35.26% and image shaking by 62.33%, while preserving smooth motion and stable working-distance regulation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_20500
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining Zhou, Keyu Xu, Peisen Wu, Yahao Chen, Jiming Li, Gaofeng Li, Shunlei Robotics Computer Vision and Pattern Recognition Autonomous laparoscopic camera control must maintain a stable and safe surgical view under rapid tool-tissue interactions while remaining interpretable to surgeons. We present a strategy-grounded framework that couples high-level vision-language inference with low-level closed-loop control. Offline, raw surgical videos are parsed into camera-relevant temporal events (e.g., interaction, working-distance deviation, and view-quality degradation) and structured as attributed event graphs. Mining these graphs yields a compact set of reusable camera-handling strategy primitives, which provide structured supervision for learning. Online, a fine-tuned Vision-Language Model (VLM) processes the live laparoscopic view to predict the dominant strategy and discrete image-based motion commands, executed by an IBVS-RCM controller under strict safety constraints; optional speech input enables intuitive human-in-the-loop conditioning. On a surgeon-annotated dataset, event parsing achieves reliable temporal localization (F1-score 0.86), and the mined strategies show strong semantic alignment with expert interpretation (cluster purity 0.81). Extensive ex vivo experiments on silicone phantoms and porcine tissues demonstrate that the proposed system outperforms junior surgeons in standardized camera-handling evaluations, reducing field-of-view centering error by 35.26% and image shaking by 62.33%, while preserving smooth motion and stable working-distance regulation.
title	Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining
topic	Robotics Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.20500

Similar Items