:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Diwei, Yuan, Kun, Muller, Candice, Blanc, Frédéric, Padoy, Nicolas, Seo, Hyewon
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.13756
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models
by: Nguyen, Son Hai, et al.
Published: (2025)

AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs
by: Wang, Diwei, et al.
Published: (2025)

Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)

HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024)

fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)

CARE-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait Assessment
by: Adeli, Vida, et al.
Published: (2025)

Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
by: Chen, Tingxuan, et al.
Published: (2025)

PhyDeformer: High-Quality Non-Rigid Garment Registration with Physics-Awareness
by: Yu, Boyang, et al.
Published: (2025)

SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos
by: Nwoye, Chinedu Innocent, et al.
Published: (2024)

Advancing Surgical VQA with Scene Graph Knowledge
by: Yuan, Kun, et al.
Published: (2023)

Shape Conditioned Human Motion Generation with Diffusion Model
by: Xue, Kebing, et al.
Published: (2024)

A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis
by: Rao, Haocong, et al.
Published: (2024)

Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision
by: Walimbe, Soham, et al.
Published: (2025)

CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
by: Stilz, Florian, et al.
Published: (2026)

SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy
by: Li, Shi, et al.
Published: (2026)

On-the-Fly Point Annotation for Fast Medical Video Labeling
by: Adrien, Meyer, et al.
Published: (2024)

State-Change Learning for Prediction of Future Events in Endoscopic Videos
by: Sharma, Saurav, et al.
Published: (2025)

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
by: Hu, Ming, et al.
Published: (2024)

Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset
by: Ranjan, Rahm, et al.
Published: (2024)

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)

From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature
by: Yuan, Kun, et al.
Published: (2025)

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
by: Srivastav, Vinkle, et al.
Published: (2024)

DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging
by: Meyer, Adrien, et al.
Published: (2026)

Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room
by: Chen, Keqi, et al.
Published: (2026)

SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
by: Wang, Guankun, et al.
Published: (2025)

Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
by: Baby, Britty, et al.
Published: (2025)

CoSimGen: Controllable Diffusion Model for Simultaneous Image and Mask Generation
by: Bose, Rupak, et al.
Published: (2025)

Jumpstarting Surgical Computer Vision
by: Alapatt, Deepak, et al.
Published: (2023)

4D Facial Expression Diffusion Model
by: Zou, Kaifeng, et al.
Published: (2023)

DSeq-JEPA: Discriminative Sequential Joint-Embedding Predictive Architecture
by: He, Xiangteng, et al.
Published: (2025)

BigGait: Learning Gait Representation You Want by Large Vision Models
by: Ye, Dingqiang, et al.
Published: (2024)

Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition
by: Hamoud, Idris, et al.
Published: (2025)

Endoshare: A Publicly Available, Surgeons-Friendly Solution to De-Identify and Manage Surgical Videos
by: Arboit, Lorenzo, et al.
Published: (2025)

Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
by: Chen, Keqi, et al.
Published: (2025)

End-to-End Learning of Multi-Organ Implicit Surfaces from 3D Medical Imaging Data
by: Zarin, Farahdiba, et al.
Published: (2025)

Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation
by: Hassanpour, Jamshid, et al.
Published: (2024)

Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis
by: Nihal, Ragib Amin, et al.
Published: (2025)

Scene Change Detection with Vision-Language Representation Learning
by: Sheng, Diwei, et al.
Published: (2026)

Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung's Disease
by: Megahed, Youssef, et al.
Published: (2025)

Early Operative Difficulty Assessment in Laparoscopic Cholecystectomy via Snapshot-Centric Video Analysis
by: Sharma, Saurav, et al.
Published: (2025)