Saved in:
| Main Authors: | Yuan, Kun, Chen, Tingxuan, Li, Shi, Lavanchy, Joel L., Heiliger, Christian, Özsoy, Ege, Huang, Yiming, Bai, Long, Navab, Nassir, Srivastav, Vinkle, Ren, Hongliang, Padoy, Nicolas |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.20254 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025)
by: Chen, Tingxuan, et al.
Published: (2025)
Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
by: Stilz, Florian, et al.
Published: (2026)
by: Stilz, Florian, et al.
Published: (2026)
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision
by: Walimbe, Soham, et al.
Published: (2025)
by: Walimbe, Soham, et al.
Published: (2025)
BridgeSplat: Bidirectionally Coupled CT and Non-Rigid Gaussian Splatting for Deformable Intraoperative Surgical Navigation
by: Fehrentz, Maximilian, et al.
Published: (2025)
by: Fehrentz, Maximilian, et al.
Published: (2025)
ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling
by: Özsoy, Ege, et al.
Published: (2024)
by: Özsoy, Ege, et al.
Published: (2024)
PhenoKG: Knowledge Graph-Driven Gene Discovery and Patient Insights from Phenotypes Alone
by: Zaripova, Kamilia, et al.
Published: (2025)
by: Zaripova, Kamilia, et al.
Published: (2025)
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
by: Srivastav, Vinkle, et al.
Published: (2024)
by: Srivastav, Vinkle, et al.
Published: (2024)
When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating Room
by: Chen, Keqi, et al.
Published: (2025)
by: Chen, Keqi, et al.
Published: (2025)
SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting
by: Huang, Yiming, et al.
Published: (2025)
by: Huang, Yiming, et al.
Published: (2025)
PanORama: Multiview Consistent Panoptic Segmentation in Operating Rooms
by: Gürbüz, Tuna, et al.
Published: (2026)
by: Gürbüz, Tuna, et al.
Published: (2026)
Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting
by: Pellegrini, Chantal, et al.
Published: (2026)
by: Pellegrini, Chantal, et al.
Published: (2026)
RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance
by: Pellegrini, Chantal, et al.
Published: (2023)
by: Pellegrini, Chantal, et al.
Published: (2023)
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding
by: Özsoy, Ege, et al.
Published: (2025)
by: Özsoy, Ege, et al.
Published: (2025)
Beyond Role-Based Surgical Domain Modeling: Generalizable Re-Identification in the Operating Room
by: Wang, Tony Danjun, et al.
Published: (2025)
by: Wang, Tony Danjun, et al.
Published: (2025)
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
by: Chen, Keqi, et al.
Published: (2025)
by: Chen, Keqi, et al.
Published: (2025)
End-to-End Learning of Multi-Organ Implicit Surfaces from 3D Medical Imaging Data
by: Zarin, Farahdiba, et al.
Published: (2025)
by: Zarin, Farahdiba, et al.
Published: (2025)
Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation
by: Hassanpour, Jamshid, et al.
Published: (2024)
by: Hassanpour, Jamshid, et al.
Published: (2024)
Specialized Foundation Models for Intelligent Operating Rooms
by: Özsoy, Ege, et al.
Published: (2025)
by: Özsoy, Ege, et al.
Published: (2025)
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
by: Bani-Harouni, David, et al.
Published: (2025)
by: Bani-Harouni, David, et al.
Published: (2025)
EHR2Path: Scalable Modeling of Longitudinal Patient Pathways from Multimodal Electronic Health Records
by: Pellegrini, Chantal, et al.
Published: (2025)
by: Pellegrini, Chantal, et al.
Published: (2025)
Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition
by: Hamoud, Idris, et al.
Published: (2025)
by: Hamoud, Idris, et al.
Published: (2025)
Jumpstarting Surgical Computer Vision
by: Alapatt, Deepak, et al.
Published: (2023)
by: Alapatt, Deepak, et al.
Published: (2023)
Endoshare: A Publicly Available, Surgeons-Friendly Solution to De-Identify and Manage Surgical Videos
by: Arboit, Lorenzo, et al.
Published: (2025)
by: Arboit, Lorenzo, et al.
Published: (2025)
Where It Moves, It Matters: Referring Surgical Instrument Segmentation via Motion
by: Wei, Meng, et al.
Published: (2026)
by: Wei, Meng, et al.
Published: (2026)
TrackOR: Towards Personalized Intelligent Operating Rooms Through Robust Tracking
by: Wang, Tony Danjun, et al.
Published: (2025)
by: Wang, Tony Danjun, et al.
Published: (2025)
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
by: Özsoy, Ege, et al.
Published: (2025)
by: Özsoy, Ege, et al.
Published: (2025)
Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
by: Baby, Britty, et al.
Published: (2025)
by: Baby, Britty, et al.
Published: (2025)
SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy
by: Li, Shi, et al.
Published: (2026)
by: Li, Shi, et al.
Published: (2026)
SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
by: Wang, Guankun, et al.
Published: (2025)
by: Wang, Guankun, et al.
Published: (2025)
Location-Free Scene Graph Generation
by: Özsoy, Ege, et al.
Published: (2023)
by: Özsoy, Ege, et al.
Published: (2023)
Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room
by: Chen, Keqi, et al.
Published: (2026)
by: Chen, Keqi, et al.
Published: (2026)
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
by: Bani-Harouni, David, et al.
Published: (2025)
by: Bani-Harouni, David, et al.
Published: (2025)
UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation
by: Zhou, Yue, et al.
Published: (2025)
by: Zhou, Yue, et al.
Published: (2025)
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
by: Nwoye, Chinedu Innocent, et al.
Published: (2023)
by: Nwoye, Chinedu Innocent, et al.
Published: (2023)
A Skull-Adaptive Framework for AI-Based 3D Transcranial Focused Ultrasound Simulation
by: Srivastav, Vinkle, et al.
Published: (2025)
by: Srivastav, Vinkle, et al.
Published: (2025)
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
by: Hu, Ming, et al.
Published: (2024)
by: Hu, Ming, et al.
Published: (2024)
Similar Items
-
Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025) -
Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023) -
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024) -
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024) -
CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
by: Stilz, Florian, et al.
Published: (2026)