:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Kun, Chen, Tingxuan, Li, Shi, Lavanchy, Joel L., Heiliger, Christian, Özsoy, Ege, Huang, Yiming, Bai, Long, Navab, Nassir, Srivastav, Vinkle, Ren, Hongliang, Padoy, Nicolas
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.20254
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025)

Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)

HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024)

Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)

CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
by: Stilz, Florian, et al.
Published: (2026)

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)

Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision
by: Walimbe, Soham, et al.
Published: (2025)

BridgeSplat: Bidirectionally Coupled CT and Non-Rigid Gaussian Splatting for Deformable Intraoperative Surgical Navigation
by: Fehrentz, Maximilian, et al.
Published: (2025)

ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling
by: Özsoy, Ege, et al.
Published: (2024)

PhenoKG: Knowledge Graph-Driven Gene Discovery and Patient Insights from Phenotypes Alone
by: Zaripova, Kamilia, et al.
Published: (2025)

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
by: Srivastav, Vinkle, et al.
Published: (2024)

When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating Room
by: Chen, Keqi, et al.
Published: (2025)

SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting
by: Huang, Yiming, et al.
Published: (2025)

PanORama: Multiview Consistent Panoptic Segmentation in Operating Rooms
by: Gürbüz, Tuna, et al.
Published: (2026)

Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting
by: Pellegrini, Chantal, et al.
Published: (2026)

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance
by: Pellegrini, Chantal, et al.
Published: (2023)

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding
by: Özsoy, Ege, et al.
Published: (2025)

Beyond Role-Based Surgical Domain Modeling: Generalizable Re-Identification in the Operating Room
by: Wang, Tony Danjun, et al.
Published: (2025)

Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
by: Chen, Keqi, et al.
Published: (2025)

End-to-End Learning of Multi-Organ Implicit Surfaces from 3D Medical Imaging Data
by: Zarin, Farahdiba, et al.
Published: (2025)

Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation
by: Hassanpour, Jamshid, et al.
Published: (2024)

Specialized Foundation Models for Intelligent Operating Rooms
by: Özsoy, Ege, et al.
Published: (2025)

Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
by: Bani-Harouni, David, et al.
Published: (2025)

EHR2Path: Scalable Modeling of Longitudinal Patient Pathways from Multimodal Electronic Health Records
by: Pellegrini, Chantal, et al.
Published: (2025)

Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition
by: Hamoud, Idris, et al.
Published: (2025)

Jumpstarting Surgical Computer Vision
by: Alapatt, Deepak, et al.
Published: (2023)

Endoshare: A Publicly Available, Surgeons-Friendly Solution to De-Identify and Manage Surgical Videos
by: Arboit, Lorenzo, et al.
Published: (2025)

Where It Moves, It Matters: Referring Surgical Instrument Segmentation via Motion
by: Wei, Meng, et al.
Published: (2026)

TrackOR: Towards Personalized Intelligent Operating Rooms Through Robust Tracking
by: Wang, Tony Danjun, et al.
Published: (2025)

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
by: Özsoy, Ege, et al.
Published: (2025)

Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
by: Baby, Britty, et al.
Published: (2025)

SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy
by: Li, Shi, et al.
Published: (2026)

SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
by: Wang, Guankun, et al.
Published: (2025)

Location-Free Scene Graph Generation
by: Özsoy, Ege, et al.
Published: (2023)

Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room
by: Chen, Keqi, et al.
Published: (2026)

Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
by: Bani-Harouni, David, et al.
Published: (2025)

UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation
by: Zhou, Yue, et al.
Published: (2025)

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
by: Nwoye, Chinedu Innocent, et al.
Published: (2023)

A Skull-Adaptive Framework for AI-Based 3D Transcranial Focused Ultrasound Simulation
by: Srivastav, Vinkle, et al.
Published: (2025)

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
by: Hu, Ming, et al.
Published: (2024)