Saved in:
| Main Authors: | Wang, Diwei, Yuan, Kun, Muller, Candice, Blanc, Frédéric, Padoy, Nicolas, Seo, Hyewon |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.13756 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models
by: Nguyen, Son Hai, et al.
Published: (2025)
by: Nguyen, Son Hai, et al.
Published: (2025)
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs
by: Wang, Diwei, et al.
Published: (2025)
by: Wang, Diwei, et al.
Published: (2025)
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)
by: Sharma, Saurav, et al.
Published: (2025)
CARE-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait Assessment
by: Adeli, Vida, et al.
Published: (2025)
by: Adeli, Vida, et al.
Published: (2025)
Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025)
by: Chen, Tingxuan, et al.
Published: (2025)
PhyDeformer: High-Quality Non-Rigid Garment Registration with Physics-Awareness
by: Yu, Boyang, et al.
Published: (2025)
by: Yu, Boyang, et al.
Published: (2025)
SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos
by: Nwoye, Chinedu Innocent, et al.
Published: (2024)
by: Nwoye, Chinedu Innocent, et al.
Published: (2024)
Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
Shape Conditioned Human Motion Generation with Diffusion Model
by: Xue, Kebing, et al.
Published: (2024)
by: Xue, Kebing, et al.
Published: (2024)
A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis
by: Rao, Haocong, et al.
Published: (2024)
by: Rao, Haocong, et al.
Published: (2024)
Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision
by: Walimbe, Soham, et al.
Published: (2025)
by: Walimbe, Soham, et al.
Published: (2025)
CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
by: Stilz, Florian, et al.
Published: (2026)
by: Stilz, Florian, et al.
Published: (2026)
SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy
by: Li, Shi, et al.
Published: (2026)
by: Li, Shi, et al.
Published: (2026)
On-the-Fly Point Annotation for Fast Medical Video Labeling
by: Adrien, Meyer, et al.
Published: (2024)
by: Adrien, Meyer, et al.
Published: (2024)
State-Change Learning for Prediction of Future Events in Endoscopic Videos
by: Sharma, Saurav, et al.
Published: (2025)
by: Sharma, Saurav, et al.
Published: (2025)
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
by: Hu, Ming, et al.
Published: (2024)
by: Hu, Ming, et al.
Published: (2024)
Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset
by: Ranjan, Rahm, et al.
Published: (2024)
by: Ranjan, Rahm, et al.
Published: (2024)
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature
by: Yuan, Kun, et al.
Published: (2025)
by: Yuan, Kun, et al.
Published: (2025)
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
by: Srivastav, Vinkle, et al.
Published: (2024)
by: Srivastav, Vinkle, et al.
Published: (2024)
DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging
by: Meyer, Adrien, et al.
Published: (2026)
by: Meyer, Adrien, et al.
Published: (2026)
Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room
by: Chen, Keqi, et al.
Published: (2026)
by: Chen, Keqi, et al.
Published: (2026)
SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
by: Wang, Guankun, et al.
Published: (2025)
by: Wang, Guankun, et al.
Published: (2025)
Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
by: Baby, Britty, et al.
Published: (2025)
by: Baby, Britty, et al.
Published: (2025)
CoSimGen: Controllable Diffusion Model for Simultaneous Image and Mask Generation
by: Bose, Rupak, et al.
Published: (2025)
by: Bose, Rupak, et al.
Published: (2025)
Jumpstarting Surgical Computer Vision
by: Alapatt, Deepak, et al.
Published: (2023)
by: Alapatt, Deepak, et al.
Published: (2023)
4D Facial Expression Diffusion Model
by: Zou, Kaifeng, et al.
Published: (2023)
by: Zou, Kaifeng, et al.
Published: (2023)
DSeq-JEPA: Discriminative Sequential Joint-Embedding Predictive Architecture
by: He, Xiangteng, et al.
Published: (2025)
by: He, Xiangteng, et al.
Published: (2025)
BigGait: Learning Gait Representation You Want by Large Vision Models
by: Ye, Dingqiang, et al.
Published: (2024)
by: Ye, Dingqiang, et al.
Published: (2024)
Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition
by: Hamoud, Idris, et al.
Published: (2025)
by: Hamoud, Idris, et al.
Published: (2025)
Endoshare: A Publicly Available, Surgeons-Friendly Solution to De-Identify and Manage Surgical Videos
by: Arboit, Lorenzo, et al.
Published: (2025)
by: Arboit, Lorenzo, et al.
Published: (2025)
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
by: Chen, Keqi, et al.
Published: (2025)
by: Chen, Keqi, et al.
Published: (2025)
End-to-End Learning of Multi-Organ Implicit Surfaces from 3D Medical Imaging Data
by: Zarin, Farahdiba, et al.
Published: (2025)
by: Zarin, Farahdiba, et al.
Published: (2025)
Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation
by: Hassanpour, Jamshid, et al.
Published: (2024)
by: Hassanpour, Jamshid, et al.
Published: (2024)
Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis
by: Nihal, Ragib Amin, et al.
Published: (2025)
by: Nihal, Ragib Amin, et al.
Published: (2025)
Scene Change Detection with Vision-Language Representation Learning
by: Sheng, Diwei, et al.
Published: (2026)
by: Sheng, Diwei, et al.
Published: (2026)
Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung's Disease
by: Megahed, Youssef, et al.
Published: (2025)
by: Megahed, Youssef, et al.
Published: (2025)
Early Operative Difficulty Assessment in Laparoscopic Cholecystectomy via Snapshot-Centric Video Analysis
by: Sharma, Saurav, et al.
Published: (2025)
by: Sharma, Saurav, et al.
Published: (2025)
Similar Items
-
KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models
by: Nguyen, Son Hai, et al.
Published: (2025) -
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs
by: Wang, Diwei, et al.
Published: (2025) -
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024) -
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024) -
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)