:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Ming, Yuan, Kun, Shen, Yaling, Tang, Feilong, Xu, Xiaohao, Zhou, Lin, Li, Wei, Chen, Ying, Xu, Zhongxing, Peng, Zelin, Yan, Siyuan, Srivastav, Vinkle, Song, Diping, Li, Tianbin, Shi, Danli, Ye, Jin, Padoy, Nicolas, Navab, Nassir, He, Junjun, Ge, Zongyuan
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2411.15421
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)

HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024)

CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
by: Stilz, Florian, et al.
Published: (2026)

Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025)

Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
by: Hu, Ming, et al.
Published: (2024)

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)

Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision
by: Walimbe, Soham, et al.
Published: (2025)

Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition
by: Hamoud, Idris, et al.
Published: (2025)

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
by: Srivastav, Vinkle, et al.
Published: (2024)

Recognizing Surgical Phases Anywhere: Few-Shot Test-time Adaptation and Task-graph Guided Refinement
by: Yuan, Kun, et al.
Published: (2025)

Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
by: Chen, Keqi, et al.
Published: (2025)

End-to-End Learning of Multi-Organ Implicit Surfaces from 3D Medical Imaging Data
by: Zarin, Farahdiba, et al.
Published: (2025)

Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation
by: Hassanpour, Jamshid, et al.
Published: (2024)

Jumpstarting Surgical Computer Vision
by: Alapatt, Deepak, et al.
Published: (2023)

Endoshare: A Publicly Available, Surgeons-Friendly Solution to De-Identify and Manage Surgical Videos
by: Arboit, Lorenzo, et al.
Published: (2025)

Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
by: Xu, Zhongxing, et al.
Published: (2024)

OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos
by: Jangir, Ritul, et al.
Published: (2026)

Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
by: Liu, Xinyao, et al.
Published: (2025)

SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy
by: Li, Shi, et al.
Published: (2026)

Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment
by: Shen, Yaling, et al.
Published: (2025)

Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room
by: Chen, Keqi, et al.
Published: (2026)

Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery
by: Hu, Ming, et al.
Published: (2025)

SpecstatOR: Speckle statistics-based iOCT Segmentation Network for Ophthalmic Surgery
by: Mach, Kristina, et al.
Published: (2024)

Towards Comprehensive Real-Time Scene Understanding in Ophthalmic Surgery through Multimodal Image Fusion
by: Rohrmoser, Nikolo, et al.
Published: (2026)

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation
by: Wang, Xinkun, et al.
Published: (2025)

Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
by: Baby, Britty, et al.
Published: (2025)

A Skull-Adaptive Framework for AI-Based 3D Transcranial Focused Ultrasound Simulation
by: Srivastav, Vinkle, et al.
Published: (2025)

Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
by: Tang, Feilong, et al.
Published: (2024)

Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
by: Liu, Chengzhi, et al.
Published: (2025)

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model
by: Li, Wei, et al.
Published: (2025)

SURGIVID: Annotation-Efficient Surgical Video Object Discovery
by: Köksal, Çağhan, et al.
Published: (2024)

HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation
by: Biagini, Diego, et al.
Published: (2025)

Where It Moves, It Matters: Referring Surgical Instrument Segmentation via Motion
by: Wei, Meng, et al.
Published: (2026)

ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
by: Holm, Felix, et al.
Published: (2025)

PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics
by: Shen, Yaling, et al.
Published: (2026)

Where are they looking in the operating room?
by: Chen, Keqi, et al.
Published: (2026)

When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating Room
by: Chen, Keqi, et al.
Published: (2025)

From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature
by: Yuan, Kun, et al.
Published: (2025)

SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting
by: Huang, Yiming, et al.
Published: (2025)