MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Wang, Hongyu, Long, Yonghao, Chen, Yueyao, Yip, Hon-Chi, Scheppach, Markus, Chiu, Philip Wai-Yan, Yam, Yeung, Meng, Helen Mei-Ling, Dou, Qi
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2506.04716
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866908394402611200
author	Wang, Hongyu Long, Yonghao Chen, Yueyao Yip, Hon-Chi Scheppach, Markus Chiu, Philip Wai-Yan Yam, Yeung Meng, Helen Mei-Ling Dou, Qi
author_facet	Wang, Hongyu Long, Yonghao Chen, Yueyao Yip, Hon-Chi Scheppach, Markus Chiu, Philip Wai-Yan Yam, Yeung Meng, Helen Mei-Ling Dou, Qi
contents	Endoscopic Submucosal Dissection (ESD) is a well-established technique for removing epithelial lesions. Predicting dissection trajectories in ESD videos offers significant potential for enhancing surgical skill training and simplifying the learning process, yet this area remains underexplored. While imitation learning has shown promise in acquiring skills from expert demonstrations, challenges persist in handling uncertain future movements, learning geometric symmetries, and generalizing to diverse surgical scenarios. To address these, we introduce a novel approach: Implicit Diffusion Policy with Equivariant Representations for Imitation Learning (iDPOE). Our method models expert behavior through a joint state action distribution, capturing the stochastic nature of dissection trajectories and enabling robust visual representation learning across various endoscopic views. By incorporating a diffusion model into policy learning, iDPOE ensures efficient training and sampling, leading to more accurate predictions and better generalization. Additionally, we enhance the model's ability to generalize to geometric symmetries by embedding equivariance into the learning process. To address state mismatches, we develop a forward-process guided action inference strategy for conditional sampling. Using an ESD video dataset of nearly 2000 clips, experimental results show that our approach surpasses state-of-the-art methods, both explicit and implicit, in trajectory prediction. To the best of our knowledge, this is the first application of imitation learning to surgical skill development for dissection trajectory prediction.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_04716
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion Wang, Hongyu Long, Yonghao Chen, Yueyao Yip, Hon-Chi Scheppach, Markus Chiu, Philip Wai-Yan Yam, Yeung Meng, Helen Mei-Ling Dou, Qi Computer Vision and Pattern Recognition Endoscopic Submucosal Dissection (ESD) is a well-established technique for removing epithelial lesions. Predicting dissection trajectories in ESD videos offers significant potential for enhancing surgical skill training and simplifying the learning process, yet this area remains underexplored. While imitation learning has shown promise in acquiring skills from expert demonstrations, challenges persist in handling uncertain future movements, learning geometric symmetries, and generalizing to diverse surgical scenarios. To address these, we introduce a novel approach: Implicit Diffusion Policy with Equivariant Representations for Imitation Learning (iDPOE). Our method models expert behavior through a joint state action distribution, capturing the stochastic nature of dissection trajectories and enabling robust visual representation learning across various endoscopic views. By incorporating a diffusion model into policy learning, iDPOE ensures efficient training and sampling, leading to more accurate predictions and better generalization. Additionally, we enhance the model's ability to generalize to geometric symmetries by embedding equivariance into the learning process. To address state mismatches, we develop a forward-process guided action inference strategy for conditional sampling. Using an ESD video dataset of nearly 2000 clips, experimental results show that our approach surpasses state-of-the-art methods, both explicit and implicit, in trajectory prediction. To the best of our knowledge, this is the first application of imitation learning to surgical skill development for dissection trajectory prediction.
title	Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2506.04716

Documenti analoghi