:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Bohan, Yang, Shuojue, Peng, Baorui, Guo, Xianda, Zhang, Erli, Tao, Youqi, Duan, Junfeng, Xu, Daguang, Dou, Qi, Jin, Xin, Zeng, Wenjun, Zhao, Hao, Jin, Yueming
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.08712
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting
by: Yang, Shuojue, et al.
Published: (2024)

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
by: Liu, Haofeng, et al.
Published: (2024)

SurfSurg6D: Geometry Consistent Dense Correspondence for Textureless Surgical Instrument Pose Estimation
by: Shen, Daiyun, et al.
Published: (2026)

ToolTipNet: A Segmentation-Driven Deep Learning Baseline for Surgical Instrument Tip Detection
by: Wu, Zijian, et al.
Published: (2025)

Free-DyGS: Camera-Pose-Free Scene Reconstruction for Dynamic Surgical Videos with Gaussian Splatting
by: Li, Qian, et al.
Published: (2024)

OmniNWM: Omniscient Driving Navigation World Models
by: Li, Bohan, et al.
Published: (2025)

Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting
by: Yang, Shuojue, et al.
Published: (2025)

BCRNet: Enhancing Landmark Detection in Laparoscopic Liver Surgery via Bezier Curve Refinement
by: Li, Qian, et al.
Published: (2025)

Instrument-Splatting++: Towards Controllable Surgical Instrument Digital Twin Using Gaussian Splatting
by: Yang, Shuojue, et al.
Published: (2026)

SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence
by: Zeng, Zhitao, et al.
Published: (2025)

ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models
by: Peng, Baorui, et al.
Published: (2026)

Cosmos-H-Surgical: Learning Surgical Robot Policies from Videos via World Modeling
by: He, Yufan, et al.
Published: (2025)

SurgCalib: Gaussian Splatting-Based Hand-Eye Calibration for Robot-Assisted Minimally Invasive Surgery
by: Wu, Zijian, et al.
Published: (2026)

Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos
by: Fang, Zheng, et al.
Published: (2025)

Generalized Recognition of Basic Surgical Actions Enables Skill Assessment and Vision-Language-Model-based Surgical Planning
by: Xu, Mengya, et al.
Published: (2026)

Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence
by: Zeng, Zhitao, et al.
Published: (2026)

Articulated Kinematics Distillation from Video Diffusion Models
by: Li, Xuan, et al.
Published: (2025)

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields
by: Yang, Zhaoyang, et al.
Published: (2026)

Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos
by: Shao, Zhimin, et al.
Published: (2024)

SurgRAW: Multi-Agent Workflow with Chain of Thought Reasoning for Robotic Surgical Video Analysis
by: Low, Chang Han, et al.
Published: (2025)

Systematic Evaluation and Guidelines for Segment Anything Model in Surgical Video Analysis
by: Yuan, Cheng, et al.
Published: (2024)

Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction
by: Li, Bohan, et al.
Published: (2024)

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
by: Wang, Qi, et al.
Published: (2023)

Kinematic-Based Assessment of Surgical Actions in Microanastomosis
by: Meng, Yan, et al.
Published: (2025)

Closed-Loop Unsupervised Representation Disentanglement with $β$-VAE Distillation and Diffusion Probabilistic Feedback
by: Jin, Xin, et al.
Published: (2024)

Surgical Action Planning with Large Language Models
by: Xu, Mengya, et al.
Published: (2025)

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
by: Li, Bohan, et al.
Published: (2024)

One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception
by: Li, Bohan, et al.
Published: (2023)

NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation
by: Xie, Baao, et al.
Published: (2023)

Temporally Guided Articulated Hand Pose Tracking in Surgical Videos
by: Louis, Nathan, et al.
Published: (2021)

SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation
by: Rapuri, Sampath, et al.
Published: (2026)

SurgFed: Language-guided Multi-Task Federated Learning for Surgical Video Understanding
by: Fang, Zheng, et al.
Published: (2026)

SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning
by: Chen, Juo-Tung, et al.
Published: (2025)

ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
by: Liu, Haofeng, et al.
Published: (2025)

AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder
by: Jin, Qiaoqiao, et al.
Published: (2024)

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding
by: Fan, Xiang, et al.
Published: (2026)

VAGPO: Vision-augmented Asymmetric Group Preference Optimization for Graph Routing Problems
by: Liu, Shiyan, et al.
Published: (2025)

SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking
by: Liu, Haofeng, et al.
Published: (2025)

MViewRouter: Internalizing Geometric Equivariance via Multi-view Alternating Attention for Combinatorial Routing
by: Liu, Shiyan, et al.
Published: (2026)

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
by: Luo, Xiangyang, et al.
Published: (2026)