Saved in:
| Main Authors: | Zhou, Rulin, Wang, Guankun, Wang, An, Ma, Yujie, Ouyang, Lixin, Cui, Bolin, Li, Junyan, Zhu, Chaowei, Li, Mingyang, Chen, Ming, Zhong, Xiaopin, Lu, Peng, Wang, Jiankun, Liu, Xianming, Ren, Hongliang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.20636 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bridging Vision and Language for Robust Context-Aware Surgical Point Tracking: The VL-SurgPT Dataset and Benchmark
by: Zhou, Rulin, et al.
Published: (2025)
by: Zhou, Rulin, et al.
Published: (2025)
TSUBF-Net: Trans-Spatial UNet-like Network with Bi-direction Fusion for Segmentation of Adenoid Hypertrophy in CT
by: Zhou, Rulin, et al.
Published: (2024)
by: Zhou, Rulin, et al.
Published: (2024)
Adapting SAM for Surgical Instrument Tracking and Segmentation in Endoscopic Submucosal Dissection Videos
by: Yu, Jieming, et al.
Published: (2024)
by: Yu, Jieming, et al.
Published: (2024)
SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting
by: Huang, Yiming, et al.
Published: (2025)
by: Huang, Yiming, et al.
Published: (2025)
Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks
by: Zhong, Xiaopin, et al.
Published: (2022)
by: Zhong, Xiaopin, et al.
Published: (2022)
Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision
by: Zhou, Rulin, et al.
Published: (2025)
by: Zhou, Rulin, et al.
Published: (2025)
SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
by: Wang, Guankun, et al.
Published: (2025)
by: Wang, Guankun, et al.
Published: (2025)
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
by: Bai, Long, et al.
Published: (2024)
by: Bai, Long, et al.
Published: (2024)
How can reasoning capability empower the AI copilot robot in endoscopic surgery
by: Wang, Guankun, et al.
Published: (2026)
by: Wang, Guankun, et al.
Published: (2026)
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
by: Wang, Guankun, et al.
Published: (2024)
by: Wang, Guankun, et al.
Published: (2024)
EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Control
by: Wang, An, et al.
Published: (2025)
by: Wang, An, et al.
Published: (2025)
Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal Dissection
by: Tang, Rui, et al.
Published: (2025)
by: Tang, Rui, et al.
Published: (2025)
OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted Surgery
by: Bai, Long, et al.
Published: (2024)
by: Bai, Long, et al.
Published: (2024)
SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
by: Chen, Tong, et al.
Published: (2024)
by: Chen, Tong, et al.
Published: (2024)
BleedOrigin: Dynamic Bleeding Source Localization in Endoscopic Submucosal Dissection via Dual-Stage Detection and Tracking
by: Xu, Mengya, et al.
Published: (2025)
by: Xu, Mengya, et al.
Published: (2025)
SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos
by: Wu, Jinlin, et al.
Published: (2026)
by: Wu, Jinlin, et al.
Published: (2026)
EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis
by: Tan, Qiaozhi, et al.
Published: (2024)
by: Tan, Qiaozhi, et al.
Published: (2024)
TMR-VLA:Vision-Language-Action Model for Magnetic Motion Control of Tri-leg Silicone-based Soft Robot
by: Tang, Ruijie, et al.
Published: (2026)
by: Tang, Ruijie, et al.
Published: (2026)
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
by: Wu, Junyan, et al.
Published: (2024)
by: Wu, Junyan, et al.
Published: (2024)
LightFC-X: Lightweight Convolutional Tracker for RGB-X Tracking
by: Li, Yunfeng, et al.
Published: (2025)
by: Li, Yunfeng, et al.
Published: (2025)
EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy
by: Ng, Chi Kit, et al.
Published: (2025)
by: Ng, Chi Kit, et al.
Published: (2025)
DeTracker: Motion-decoupled Vehicle Detection and Tracking in Unstabilized Satellite Videos
by: Chen, Jiajun, et al.
Published: (2026)
by: Chen, Jiajun, et al.
Published: (2026)
Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement
by: Bai, Long, et al.
Published: (2025)
by: Bai, Long, et al.
Published: (2025)
PDZSeg: Adapting the Foundation Model for Dissection Zone Segmentation with Visual Prompts in Robot-assisted Endoscopic Submucosal Dissection
by: Xu, Mengya, et al.
Published: (2024)
by: Xu, Mengya, et al.
Published: (2024)
SurgTrack: CAD-Free 3D Tracking of Real-world Surgical Instruments
by: Guo, Wenwu, et al.
Published: (2024)
by: Guo, Wenwu, et al.
Published: (2024)
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
by: Wang, Guankun, et al.
Published: (2024)
by: Wang, Guankun, et al.
Published: (2024)
UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots
by: Yin, Kangning, et al.
Published: (2025)
by: Yin, Kangning, et al.
Published: (2025)
Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking
by: Zhang, Chao, et al.
Published: (2026)
by: Zhang, Chao, et al.
Published: (2026)
SurgPLAN++: Universal Surgical Phase Localization Network for Online and Offline Inference
by: Chen, Zhen, et al.
Published: (2024)
by: Chen, Zhen, et al.
Published: (2024)
RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker
by: Li, Yunfeng, et al.
Published: (2024)
by: Li, Yunfeng, et al.
Published: (2024)
Intuitive Surgical SurgToolLoc and SurgVU Challenges Results: 2022-2025
by: Zia, Aneeq, et al.
Published: (2023)
by: Zia, Aneeq, et al.
Published: (2023)
Surgical Visual Understanding (SurgVU) Dataset
by: Zia, Aneeq, et al.
Published: (2025)
by: Zia, Aneeq, et al.
Published: (2025)
EndoARSS: Adapting Spatially-Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic Surgery
by: Wang, Guankun, et al.
Published: (2025)
by: Wang, Guankun, et al.
Published: (2025)
EndoARSS: Adapting Spatially Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic Surgery
by: Guankun Wang, et al.
Published: (2025)
by: Guankun Wang, et al.
Published: (2025)
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
by: Liu, Haofeng, et al.
Published: (2025)
by: Liu, Haofeng, et al.
Published: (2025)
OmniTracker: Unifying Object Tracking by Tracking-with-Detection
by: Wang, Junke, et al.
Published: (2023)
by: Wang, Junke, et al.
Published: (2023)
Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression
by: Li, Daxin, et al.
Published: (2024)
by: Li, Daxin, et al.
Published: (2024)
Collision Risk Quantification and Conflict Resolution in Trajectory Tracking for Acceleration-Actuated Multi-Robot Systems
by: Li, Xiaoxiao, et al.
Published: (2025)
by: Li, Xiaoxiao, et al.
Published: (2025)
TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
by: Liu, Haoyue, et al.
Published: (2025)
by: Liu, Haoyue, et al.
Published: (2025)
SurgPose: a Dataset for Articulated Robotic Surgical Tool Pose Estimation and Tracking
by: Wu, Zijian, et al.
Published: (2025)
by: Wu, Zijian, et al.
Published: (2025)
Similar Items
-
Bridging Vision and Language for Robust Context-Aware Surgical Point Tracking: The VL-SurgPT Dataset and Benchmark
by: Zhou, Rulin, et al.
Published: (2025) -
TSUBF-Net: Trans-Spatial UNet-like Network with Bi-direction Fusion for Segmentation of Adenoid Hypertrophy in CT
by: Zhou, Rulin, et al.
Published: (2024) -
Adapting SAM for Surgical Instrument Tracking and Segmentation in Endoscopic Submucosal Dissection Videos
by: Yu, Jieming, et al.
Published: (2024) -
SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting
by: Huang, Yiming, et al.
Published: (2025) -
Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks
by: Zhong, Xiaopin, et al.
Published: (2022)