Saved in:
| Main Authors: | Pezold, Simon, Kurylec, Jérôme A., Liechti, Jan S., Müller, Beat P., Lavanchy, Joël L. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.06831 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Surgical Text-to-Image Generation
by: Nwoye, Chinedu Innocent, et al.
Published: (2024)
by: Nwoye, Chinedu Innocent, et al.
Published: (2024)
Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
by: Nwoye, Chinedu Innocent, et al.
Published: (2023)
by: Nwoye, Chinedu Innocent, et al.
Published: (2023)
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
Recognizing Surgical Phases Anywhere: Few-Shot Test-time Adaptation and Task-graph Guided Refinement
by: Yuan, Kun, et al.
Published: (2025)
by: Yuan, Kun, et al.
Published: (2025)
Leveraging Foundation Models for Multimodal Graph-Based Action Recognition
by: Ziaeetabar, Fatemeh, et al.
Published: (2025)
by: Ziaeetabar, Fatemeh, et al.
Published: (2025)
Feature Mixing Approach for Detecting Intraoperative Adverse Events in Laparoscopic Roux-en-Y Gastric Bypass Surgery
by: Bose, Rupak, et al.
Published: (2025)
by: Bose, Rupak, et al.
Published: (2025)
Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images
by: Seidlitz, Silvia, et al.
Published: (2024)
by: Seidlitz, Silvia, et al.
Published: (2024)
Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance
by: Yin, Lianhao, et al.
Published: (2025)
by: Yin, Lianhao, et al.
Published: (2025)
Scaling Video Pretraining for Surgical Foundation Models
by: Lu, Sicheng, et al.
Published: (2026)
by: Lu, Sicheng, et al.
Published: (2026)
Surgical Depth Anything: Depth Estimation for Surgical Scenes using Foundation Models
by: Lou, Ange, et al.
Published: (2024)
by: Lou, Ange, et al.
Published: (2024)
When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating Room
by: Chen, Keqi, et al.
Published: (2025)
by: Chen, Keqi, et al.
Published: (2025)
Leveraging Large Language Models to Effectively Generate Visual Data for Canine Musculoskeletal Diagnoses
by: Thißen, Martin, et al.
Published: (2025)
by: Thißen, Martin, et al.
Published: (2025)
A Generative Foundation Model for Multimodal Histopathology
by: Xiang, Jinxi, et al.
Published: (2026)
by: Xiang, Jinxi, et al.
Published: (2026)
AIpparel: A Multimodal Foundation Model for Digital Garments
by: Nakayama, Kiyohiro, et al.
Published: (2024)
by: Nakayama, Kiyohiro, et al.
Published: (2024)
Leveraging Foundation Models for Causal Generative Modeling
by: Komanduri, Aneesh, et al.
Published: (2026)
by: Komanduri, Aneesh, et al.
Published: (2026)
Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025)
by: Chen, Tingxuan, et al.
Published: (2025)
Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection
by: Matuszka, Tamás, et al.
Published: (2024)
by: Matuszka, Tamás, et al.
Published: (2024)
TAP into the Patch Tokens: Leveraging Vision Foundation Model Features for AI-Generated Image Detection
by: Abdullah, Ahmed, et al.
Published: (2026)
by: Abdullah, Ahmed, et al.
Published: (2026)
Leveraging Large Language Models for Multimodal Search
by: Barbany, Oriol, et al.
Published: (2024)
by: Barbany, Oriol, et al.
Published: (2024)
ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation
by: Li, Peiyu, et al.
Published: (2024)
by: Li, Peiyu, et al.
Published: (2024)
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
by: Khan, Ufaq, et al.
Published: (2025)
by: Khan, Ufaq, et al.
Published: (2025)
Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
by: Nguyen, Huu Tien, et al.
Published: (2025)
by: Nguyen, Huu Tien, et al.
Published: (2025)
DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
by: Subramanyam, Rakshith, et al.
Published: (2024)
by: Subramanyam, Rakshith, et al.
Published: (2024)
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
by: Gupta, Sharut, et al.
Published: (2025)
by: Gupta, Sharut, et al.
Published: (2025)
SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos
by: Wu, Jinlin, et al.
Published: (2026)
by: Wu, Jinlin, et al.
Published: (2026)
LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings
by: Che, Chengan, et al.
Published: (2025)
by: Che, Chengan, et al.
Published: (2025)
Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines
by: Wu, Liang, et al.
Published: (2024)
by: Wu, Liang, et al.
Published: (2024)
Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding
by: Gastager, David, et al.
Published: (2025)
by: Gastager, David, et al.
Published: (2025)
Leveraging Multimodal Diffusion Models to Accelerate Imaging with Side Information
by: Efimov, Timofey, et al.
Published: (2024)
by: Efimov, Timofey, et al.
Published: (2024)
Leveraging Diffusion Model and Image Foundation Model for Improved Correspondence Matching in Coronary Angiography
by: Zhao, Lin, et al.
Published: (2025)
by: Zhao, Lin, et al.
Published: (2025)
FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
by: Imajuku, Yuki, et al.
Published: (2024)
by: Imajuku, Yuki, et al.
Published: (2024)
A Color Image Analysis Tool to Help Users Choose a Makeup Foundation Color
by: Mao, Yafei, et al.
Published: (2024)
by: Mao, Yafei, et al.
Published: (2024)
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation
by: Knaebel, Karim, et al.
Published: (2025)
by: Knaebel, Karim, et al.
Published: (2025)
Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation
by: Cambrin, Daniele Rege, et al.
Published: (2024)
by: Cambrin, Daniele Rege, et al.
Published: (2024)
On the Role of Depth in Surgical Vision Foundation Models: An Empirical Study of RGB-D Pre-training
by: Han, John J., et al.
Published: (2026)
by: Han, John J., et al.
Published: (2026)
Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding
by: Truong, Thanh-Dat, et al.
Published: (2025)
by: Truong, Thanh-Dat, et al.
Published: (2025)
Evaluating Large Vision-language Models for Surgical Tool Detection
by: Poudel, Nakul, et al.
Published: (2026)
by: Poudel, Nakul, et al.
Published: (2026)
Finding 3D Scene Analogies with Multimodal Foundation Models
by: Kim, Junho, et al.
Published: (2025)
by: Kim, Junho, et al.
Published: (2025)
Similar Items
-
Surgical Text-to-Image Generation
by: Nwoye, Chinedu Innocent, et al.
Published: (2024) -
Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023) -
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
by: Nwoye, Chinedu Innocent, et al.
Published: (2023) -
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023) -
Recognizing Surgical Phases Anywhere: Few-Shot Test-time Adaptation and Task-graph Guided Refinement
by: Yuan, Kun, et al.
Published: (2025)