Saved in:
| Main Authors: | Song, Ziyang, Zang, Zelin, Ye, Xiaofan, Xu, Boqiang, Bai, Long, Wu, Jinlin, Ren, Hongliang, Liu, Hongbin, Luo, Jiebo, Lei, Zhen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.06921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Anatomy-R1: Enhancing Anatomy Reasoning in Multimodal Large Language Models via Anatomical Similarity Curriculum and Group Diversity Augmentation
by: Song, Ziyang, et al.
Published: (2025)
by: Song, Ziyang, et al.
Published: (2025)
SurgPLAN++: Universal Surgical Phase Localization Network for Online and Offline Inference
by: Chen, Zhen, et al.
Published: (2024)
by: Chen, Zhen, et al.
Published: (2024)
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding
by: Chen, Zhen, et al.
Published: (2024)
by: Chen, Zhen, et al.
Published: (2024)
Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation
by: Liang, Xusheng, et al.
Published: (2025)
by: Liang, Xusheng, et al.
Published: (2025)
SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos
by: Wu, Jinlin, et al.
Published: (2026)
by: Wu, Jinlin, et al.
Published: (2026)
Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping
by: Huang, Yiming, et al.
Published: (2025)
by: Huang, Yiming, et al.
Published: (2025)
SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding
by: Chen, Zhen, et al.
Published: (2025)
by: Chen, Zhen, et al.
Published: (2025)
Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models
by: Zhou, Lihua, et al.
Published: (2025)
by: Zhou, Lihua, et al.
Published: (2025)
Endo-4DGX: Robust Endoscopic Scene Reconstruction and Illumination Correction with Gaussian Splatting
by: Huang, Yiming, et al.
Published: (2025)
by: Huang, Yiming, et al.
Published: (2025)
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery
by: Wang, Guankun, et al.
Published: (2025)
by: Wang, Guankun, et al.
Published: (2025)
PCaM: A Progressive Focus Attention-Based Information Fusion Method for Improving Vision Transformer Domain Adaptation
by: Zang, Zelin, et al.
Published: (2025)
by: Zang, Zelin, et al.
Published: (2025)
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
by: Wang, Shengkang, et al.
Published: (2024)
by: Wang, Shengkang, et al.
Published: (2024)
Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics
by: Xu, Huan, et al.
Published: (2024)
by: Xu, Huan, et al.
Published: (2024)
Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning
by: Xu, Huan, et al.
Published: (2024)
by: Xu, Huan, et al.
Published: (2024)
ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems
by: Zhang, Yiming, et al.
Published: (2025)
by: Zhang, Yiming, et al.
Published: (2025)
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
by: Wang, Guankun, et al.
Published: (2024)
by: Wang, Guankun, et al.
Published: (2024)
Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos
by: Wei, Rongfeng, et al.
Published: (2023)
by: Wei, Rongfeng, et al.
Published: (2023)
SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model
by: Wang, Guankun, et al.
Published: (2025)
by: Wang, Guankun, et al.
Published: (2025)
Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models
by: Yu, Yongsheng, et al.
Published: (2024)
by: Yu, Yongsheng, et al.
Published: (2024)
F2PASeg: Feature Fusion for Pituitary Anatomy Segmentation in Endoscopic Surgery
by: Chen, Lumin, et al.
Published: (2025)
by: Chen, Lumin, et al.
Published: (2025)
EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy
by: Bai, Long, et al.
Published: (2024)
by: Bai, Long, et al.
Published: (2024)
VS-Assistant: Versatile Surgery Assistant on the Demand of Surgeons
by: Chen, Zhen, et al.
Published: (2024)
by: Chen, Zhen, et al.
Published: (2024)
Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection
by: Zhou, Lihua, et al.
Published: (2026)
by: Zhou, Lihua, et al.
Published: (2026)
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges
by: Fu, Rao, et al.
Published: (2024)
by: Fu, Rao, et al.
Published: (2024)
SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement
by: Lei, Zeyu, et al.
Published: (2025)
by: Lei, Zeyu, et al.
Published: (2025)
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
by: Zeng, Ziyun, et al.
Published: (2025)
by: Zeng, Ziyun, et al.
Published: (2025)
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese
by: Lyu, Hanjia, et al.
Published: (2025)
by: Lyu, Hanjia, et al.
Published: (2025)
SurgTrack: CAD-Free 3D Tracking of Real-world Surgical Instruments
by: Guo, Wenwu, et al.
Published: (2024)
by: Guo, Wenwu, et al.
Published: (2024)
SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
by: Xu, Mengya, et al.
Published: (2025)
by: Xu, Mengya, et al.
Published: (2025)
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
by: Lin, Chenwei, et al.
Published: (2024)
by: Lin, Chenwei, et al.
Published: (2024)
TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Dual-Level Scale-Oriented Contrast
by: Cui, Beilei, et al.
Published: (2025)
by: Cui, Beilei, et al.
Published: (2025)
Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery
by: Xu, Mengya, et al.
Published: (2024)
by: Xu, Mengya, et al.
Published: (2024)
RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering
by: Zhang, Chengyi, et al.
Published: (2026)
by: Zhang, Chengyi, et al.
Published: (2026)
SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot
by: Wu, Jinlin, et al.
Published: (2024)
by: Wu, Jinlin, et al.
Published: (2024)
More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery
by: Dong, Wenzhen, et al.
Published: (2025)
by: Dong, Wenzhen, et al.
Published: (2025)
Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
by: Cui, Beilei, et al.
Published: (2024)
by: Cui, Beilei, et al.
Published: (2024)
Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
by: Huang, Jinfa, et al.
Published: (2024)
by: Huang, Jinfa, et al.
Published: (2024)
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models
by: Lin, Hongzhan, et al.
Published: (2024)
by: Lin, Hongzhan, et al.
Published: (2024)
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking
by: Li, Yuying, et al.
Published: (2024)
by: Li, Yuying, et al.
Published: (2024)
A Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery
by: Xu, Mengya, et al.
Published: (2024)
by: Xu, Mengya, et al.
Published: (2024)
Similar Items
-
Anatomy-R1: Enhancing Anatomy Reasoning in Multimodal Large Language Models via Anatomical Similarity Curriculum and Group Diversity Augmentation
by: Song, Ziyang, et al.
Published: (2025) -
SurgPLAN++: Universal Surgical Phase Localization Network for Online and Offline Inference
by: Chen, Zhen, et al.
Published: (2024) -
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding
by: Chen, Zhen, et al.
Published: (2024) -
Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation
by: Liang, Xusheng, et al.
Published: (2025) -
SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos
by: Wu, Jinlin, et al.
Published: (2026)