:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pezold, Simon, Kurylec, Jérôme A., Liechti, Jan S., Müller, Beat P., Lavanchy, Joël L.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.06831
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Surgical Text-to-Image Generation
by: Nwoye, Chinedu Innocent, et al.
Published: (2024)

Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
by: Nwoye, Chinedu Innocent, et al.
Published: (2023)

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)

Recognizing Surgical Phases Anywhere: Few-Shot Test-time Adaptation and Task-graph Guided Refinement
by: Yuan, Kun, et al.
Published: (2025)

Leveraging Foundation Models for Multimodal Graph-Based Action Recognition
by: Ziaeetabar, Fatemeh, et al.
Published: (2025)

Feature Mixing Approach for Detecting Intraoperative Adverse Events in Laparoscopic Roux-en-Y Gastric Bypass Surgery
by: Bose, Rupak, et al.
Published: (2025)

Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images
by: Seidlitz, Silvia, et al.
Published: (2024)

Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance
by: Yin, Lianhao, et al.
Published: (2025)

Scaling Video Pretraining for Surgical Foundation Models
by: Lu, Sicheng, et al.
Published: (2026)

Surgical Depth Anything: Depth Estimation for Surgical Scenes using Foundation Models
by: Lou, Ange, et al.
Published: (2024)

When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating Room
by: Chen, Keqi, et al.
Published: (2025)

Leveraging Large Language Models to Effectively Generate Visual Data for Canine Musculoskeletal Diagnoses
by: Thißen, Martin, et al.
Published: (2025)

A Generative Foundation Model for Multimodal Histopathology
by: Xiang, Jinxi, et al.
Published: (2026)

AIpparel: A Multimodal Foundation Model for Digital Garments
by: Nakayama, Kiyohiro, et al.
Published: (2024)

Leveraging Foundation Models for Causal Generative Modeling
by: Komanduri, Aneesh, et al.
Published: (2026)

Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025)

Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection
by: Matuszka, Tamás, et al.
Published: (2024)

TAP into the Patch Tokens: Leveraging Vision Foundation Model Features for AI-Generated Image Detection
by: Abdullah, Ahmed, et al.
Published: (2026)

Leveraging Large Language Models for Multimodal Search
by: Barbany, Oriol, et al.
Published: (2024)

ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation
by: Li, Peiyu, et al.
Published: (2024)

Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
by: Khan, Ufaq, et al.
Published: (2025)

Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
by: Nguyen, Huu Tien, et al.
Published: (2025)

DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
by: Subramanyam, Rakshith, et al.
Published: (2024)

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
by: Gupta, Sharut, et al.
Published: (2025)

SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos
by: Wu, Jinlin, et al.
Published: (2026)

LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings
by: Che, Chengan, et al.
Published: (2025)

Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines
by: Wu, Liang, et al.
Published: (2024)

Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding
by: Gastager, David, et al.
Published: (2025)

Leveraging Multimodal Diffusion Models to Accelerate Imaging with Side Information
by: Efimov, Timofey, et al.
Published: (2024)

Leveraging Diffusion Model and Image Foundation Model for Improved Correspondence Matching in Coronary Angiography
by: Zhao, Lin, et al.
Published: (2025)

FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
by: Imajuku, Yuki, et al.
Published: (2024)

A Color Image Analysis Tool to Help Users Choose a Makeup Foundation Color
by: Mao, Yafei, et al.
Published: (2024)

Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
by: Shi, Yucheng, et al.
Published: (2025)

DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation
by: Knaebel, Karim, et al.
Published: (2025)

Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation
by: Cambrin, Daniele Rege, et al.
Published: (2024)

On the Role of Depth in Surgical Vision Foundation Models: An Empirical Study of RGB-D Pre-training
by: Han, John J., et al.
Published: (2026)

Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding
by: Truong, Thanh-Dat, et al.
Published: (2025)

Evaluating Large Vision-language Models for Surgical Tool Detection
by: Poudel, Nakul, et al.
Published: (2026)

Finding 3D Scene Analogies with Multimodal Foundation Models
by: Kim, Junho, et al.
Published: (2025)