Saved in:
| Main Authors: | Hu, Ming, Yuan, Kun, Shen, Yaling, Tang, Feilong, Xu, Xiaohao, Zhou, Lin, Li, Wei, Chen, Ying, Xu, Zhongxing, Peng, Zelin, Yan, Siyuan, Srivastav, Vinkle, Song, Diping, Li, Tianbin, Shi, Danli, Ye, Jin, Padoy, Nicolas, Navab, Nassir, He, Junjun, Ge, Zongyuan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.15421 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
by: Stilz, Florian, et al.
Published: (2026)
by: Stilz, Florian, et al.
Published: (2026)
Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025)
by: Chen, Tingxuan, et al.
Published: (2025)
Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
by: Hu, Ming, et al.
Published: (2024)
by: Hu, Ming, et al.
Published: (2024)
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision
by: Walimbe, Soham, et al.
Published: (2025)
by: Walimbe, Soham, et al.
Published: (2025)
Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition
by: Hamoud, Idris, et al.
Published: (2025)
by: Hamoud, Idris, et al.
Published: (2025)
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
by: Srivastav, Vinkle, et al.
Published: (2024)
by: Srivastav, Vinkle, et al.
Published: (2024)
Recognizing Surgical Phases Anywhere: Few-Shot Test-time Adaptation and Task-graph Guided Refinement
by: Yuan, Kun, et al.
Published: (2025)
by: Yuan, Kun, et al.
Published: (2025)
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
by: Chen, Keqi, et al.
Published: (2025)
by: Chen, Keqi, et al.
Published: (2025)
End-to-End Learning of Multi-Organ Implicit Surfaces from 3D Medical Imaging Data
by: Zarin, Farahdiba, et al.
Published: (2025)
by: Zarin, Farahdiba, et al.
Published: (2025)
Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation
by: Hassanpour, Jamshid, et al.
Published: (2024)
by: Hassanpour, Jamshid, et al.
Published: (2024)
Jumpstarting Surgical Computer Vision
by: Alapatt, Deepak, et al.
Published: (2023)
by: Alapatt, Deepak, et al.
Published: (2023)
Endoshare: A Publicly Available, Surgeons-Friendly Solution to De-Identify and Manage Surgical Videos
by: Arboit, Lorenzo, et al.
Published: (2025)
by: Arboit, Lorenzo, et al.
Published: (2025)
Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
by: Xu, Zhongxing, et al.
Published: (2024)
by: Xu, Zhongxing, et al.
Published: (2024)
OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos
by: Jangir, Ritul, et al.
Published: (2026)
by: Jangir, Ritul, et al.
Published: (2026)
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
by: Liu, Xinyao, et al.
Published: (2025)
by: Liu, Xinyao, et al.
Published: (2025)
SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy
by: Li, Shi, et al.
Published: (2026)
by: Li, Shi, et al.
Published: (2026)
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment
by: Shen, Yaling, et al.
Published: (2025)
by: Shen, Yaling, et al.
Published: (2025)
Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room
by: Chen, Keqi, et al.
Published: (2026)
by: Chen, Keqi, et al.
Published: (2026)
Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery
by: Hu, Ming, et al.
Published: (2025)
by: Hu, Ming, et al.
Published: (2025)
SpecstatOR: Speckle statistics-based iOCT Segmentation Network for Ophthalmic Surgery
by: Mach, Kristina, et al.
Published: (2024)
by: Mach, Kristina, et al.
Published: (2024)
Towards Comprehensive Real-Time Scene Understanding in Ophthalmic Surgery through Multimodal Image Fusion
by: Rohrmoser, Nikolo, et al.
Published: (2026)
by: Rohrmoser, Nikolo, et al.
Published: (2026)
Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation
by: Wang, Xinkun, et al.
Published: (2025)
by: Wang, Xinkun, et al.
Published: (2025)
Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
by: Baby, Britty, et al.
Published: (2025)
by: Baby, Britty, et al.
Published: (2025)
A Skull-Adaptive Framework for AI-Based 3D Transcranial Focused Ultrasound Simulation
by: Srivastav, Vinkle, et al.
Published: (2025)
by: Srivastav, Vinkle, et al.
Published: (2025)
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
by: Tang, Feilong, et al.
Published: (2024)
by: Tang, Feilong, et al.
Published: (2024)
Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
by: Liu, Chengzhi, et al.
Published: (2025)
by: Liu, Chengzhi, et al.
Published: (2025)
Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model
by: Li, Wei, et al.
Published: (2025)
by: Li, Wei, et al.
Published: (2025)
SURGIVID: Annotation-Efficient Surgical Video Object Discovery
by: Köksal, Çağhan, et al.
Published: (2024)
by: Köksal, Çağhan, et al.
Published: (2024)
HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation
by: Biagini, Diego, et al.
Published: (2025)
by: Biagini, Diego, et al.
Published: (2025)
Where It Moves, It Matters: Referring Surgical Instrument Segmentation via Motion
by: Wei, Meng, et al.
Published: (2026)
by: Wei, Meng, et al.
Published: (2026)
ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
by: Holm, Felix, et al.
Published: (2025)
by: Holm, Felix, et al.
Published: (2025)
PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics
by: Shen, Yaling, et al.
Published: (2026)
by: Shen, Yaling, et al.
Published: (2026)
Where are they looking in the operating room?
by: Chen, Keqi, et al.
Published: (2026)
by: Chen, Keqi, et al.
Published: (2026)
When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating Room
by: Chen, Keqi, et al.
Published: (2025)
by: Chen, Keqi, et al.
Published: (2025)
From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature
by: Yuan, Kun, et al.
Published: (2025)
by: Yuan, Kun, et al.
Published: (2025)
SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting
by: Huang, Yiming, et al.
Published: (2025)
by: Huang, Yiming, et al.
Published: (2025)
Similar Items
-
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024) -
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024) -
CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
by: Stilz, Florian, et al.
Published: (2026) -
Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025) -
Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)