:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Wenjun, Hu, Yan, Fu, Huazhu, Yang, Mingming, Chng, Chin-Boon, Kawasaki, Ryo, Chui, Cheekong, Liu, Jiang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.00322
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hierarchical Context Transformer for Multi-level Semantic Scene Understanding
by: Hao, Luoying, et al.
Published: (2025)

ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection
by: Lin, Junhao, et al.
Published: (2024)

Event-Level Detection of Surgical Instrument Handovers in Videos with Interpretable Vision Models
by: Katsarou, Katerina, et al.
Published: (2026)

Vivim: a Video Vision Mamba for Medical Video Segmentation
by: Yang, Yijun, et al.
Published: (2024)

Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding
by: Jiang, Xixi, et al.
Published: (2025)

Out-Of-Distribution Detection with Diversification (Provably)
by: Yao, Haiyun, et al.
Published: (2024)

An Efficient Streaming Video Understanding Framework with Agentic Control
by: Liu, Jinming, et al.
Published: (2026)

Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry
by: Hou, Wenjun, et al.
Published: (2024)

Rethinking Text-Promptable Surgical Instrument Segmentation with Robust Framework
by: Choi, Tae-Min, et al.
Published: (2024)

Surgical Video Understanding with Label Interpolation
by: Kim, Garam, et al.
Published: (2025)

Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?
by: Lin, Ziqin, et al.
Published: (2024)

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
by: Hu, Ming, et al.
Published: (2024)

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)

Neural shape reconstruction from multiple views with static pattern projection
by: Furukawa, Ryo, et al.
Published: (2025)

Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos
by: Wei, Rongfeng, et al.
Published: (2023)

Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting
by: Yang, Shuojue, et al.
Published: (2025)

SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation
by: Yue, Wenxi, et al.
Published: (2023)

Text-driven Multiplanar Visual Interaction for Semi-supervised Medical Image Segmentation
by: Huang, Kaiwen, et al.
Published: (2025)

TrajPred: Trajectory-Conditioned Joint Embedding Prediction for Surgical Instrument-Tissue Interaction Recognition in Vision-Language Models
by: Cheng, Jiajun, et al.
Published: (2026)

Video Dataset for Surgical Phase, Keypoint, and Instrument Recognition in Laparoscopic Surgery (PhaKIR)
by: Rueckert, Tobias, et al.
Published: (2025)

Prompt Guiding Multi-Scale Adaptive Sparse Representation-driven Network for Low-Dose CT MAR
by: Shi, Baoshun, et al.
Published: (2025)

Cycle Context Verification for In-Context Medical Image Segmentation
by: Hu, Shishuai, et al.
Published: (2025)

Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos
by: Fang, Zheng, et al.
Published: (2025)

Training-free Detection and 6D Pose Estimation of Unseen Surgical Instruments
by: Hein, Jonas, et al.
Published: (2026)

ToolTipNet: A Segmentation-Driven Deep Learning Baseline for Surgical Instrument Tip Detection
by: Wu, Zijian, et al.
Published: (2025)

A Comprehensive Augmentation Framework for Anomaly Detection
by: Lin, Jiang, et al.
Published: (2023)

Prompting Lipschitz-constrained network for multiple-in-one sparse-view CT reconstruction
by: Shi, Baoshun, et al.
Published: (2025)

SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos
by: Wu, Jinlin, et al.
Published: (2026)

Augmenting Efficient Real-time Surgical Instrument Segmentation in Video with Point Tracking and Segment Anything
by: Wu, Zijian, et al.
Published: (2024)

Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
by: Gao, Sensen, et al.
Published: (2025)

ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding
by: Chen, Zhen, et al.
Published: (2024)

Real-Time Surgical Instrument Defect Detection via Non-Destructive Testing
by: Ain, Qurrat Ul, et al.
Published: (2025)

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
by: Yang, Ruoliu, et al.
Published: (2026)

Real-time Rendering-based Surgical Instrument Tracking via Evolutionary Optimization
by: Hu, Hanyang, et al.
Published: (2026)

Topicwise Separable Sentence Retrieval for Medical Report Generation
by: Zhao, Junting, et al.
Published: (2024)

PathFL: Multi-Alignment Federated Learning for Pathology Image Segmentation
by: Zhang, Yuan, et al.
Published: (2025)

Vision-Language Model IP Protection via Prompt-based Learning
by: Wang, Lianyu, et al.
Published: (2025)

Amodal Segmentation for Laparoscopic Surgery Video Instruments
by: Shi, Ruohua, et al.
Published: (2024)

UniVRSE: Unified Vision-conditioned Response Semantic Entropy for Hallucination Detection in Medical Vision-Language Models
by: Liao, Zehui, et al.
Published: (2025)