:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shen, Yiqing, Li, Chenjia, Fan, Chenxiao, Unberath, Mathias
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.16718
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RVTBench: A Benchmark for Visual Reasoning Tasks
by: Shen, Yiqing, et al.
Published: (2025)

Reasoning Text-to-Video Retrieval via Digital Twin Video Representations and Large Language Models
by: Shen, Yiqing, et al.
Published: (2025)

Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations
by: Shen, Yiqing, et al.
Published: (2025)

Fast Reasoning Segmentation for Images and Videos
by: Shen, Yiqing, et al.
Published: (2025)

Online Reasoning Video Segmentation with Just-in-Time Digital Twins
by: Shen, Yiqing, et al.
Published: (2025)

Counterfactual World Models via Digital Twin-conditioned Video Diffusion
by: Shen, Yiqing, et al.
Published: (2025)

Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning
by: Shen, Yiqing, et al.
Published: (2025)

Reasoning Segmentation for Images and Videos: A Survey
by: Shen, Yiqing, et al.
Published: (2025)

Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins
by: Shen, Yiqing, et al.
Published: (2025)

Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer
by: Shao, Xinyuan, et al.
Published: (2024)

MoSFormer: Augmenting Temporal Context with Memory of Surgery for Surgical Phase Recognition
by: Ding, Hao, et al.
Published: (2025)

FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images
by: Shen, Yiqing, et al.
Published: (2024)

A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness
by: Drenkow, Nathan, et al.
Published: (2025)

Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations
by: Li, Yizhen, et al.
Published: (2025)

From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments
by: Oguine, Kanyifeechukwu J., et al.
Published: (2024)

Hyperspectral Image Recovery Constrained by Multi-Granularity Non-Local Self-Similarity Priors
by: Peng, Zhuoran, et al.
Published: (2025)

TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research
by: Zhang, Han, et al.
Published: (2025)

An Intrinsically Explainable Approach to Detecting Vertebral Compression Fractures in CT Scans via Neurosymbolic Modeling
by: Inigo, Blanca, et al.
Published: (2024)

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)

Spatio-Temporal Attention for Consistent Video Semantic Segmentation in Automated Driving
by: Varghese, Serin, et al.
Published: (2026)

VIRST: Video-Instructed Reasoning Assistant for SpatioTemporal Segmentation
by: Hong, Jihwan, et al.
Published: (2026)

Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning
by: Sugandhika, Chinthani, et al.
Published: (2025)

AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction
by: Maksutova, Aiza, et al.
Published: (2026)

Temporal-Conditional Referring Video Object Segmentation with Noise-Free Text-to-Video Diffusion Model
by: Zhang, Ruixin, et al.
Published: (2025)

HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning
by: Yang, Yiqing, et al.
Published: (2025)

MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning
by: Tao, Sicheng, et al.
Published: (2025)

TemporalVLM: Video LLMs for Temporal Reasoning in Long Videos
by: Fateh, Fawad Javed, et al.
Published: (2024)

Human-AI Collaboration and Explainability for 2D/3D Registration Quality Assurance
by: Cho, Sue Min, et al.
Published: (2025)

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
by: Gong, Sitong, et al.
Published: (2025)

Training-Free Spatio-temporal Decoupled Reasoning Video Segmentation with Adaptive Object Memory
by: Zhu, Zhengtong, et al.
Published: (2026)

Towards Controllable Video Synthesis of Routine and Rare OR Events
by: Schneider, Dominik, et al.
Published: (2026)

Online Reasoning Video Object Segmentation
by: Liu, Jinyuan, et al.
Published: (2026)

Causality-Driven Audits of Model Robustness
by: Drenkow, Nathan, et al.
Published: (2024)

Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing
by: Wang, Yijia, et al.
Published: (2025)

Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs
by: Zhu, Rui, et al.
Published: (2026)

Calisthenics Skills Temporal Video Segmentation
by: Finocchiaro, Antonio, et al.
Published: (2025)

Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction
by: Fan, Chenyou, et al.
Published: (2025)

Benchmarking the Robustness of Panoptic Segmentation for Automated Driving
by: Wang, Yiting, et al.
Published: (2024)

Video-based Sign Language Recognition without Temporal Segmentation
by: Huang, Jie, et al.
Published: (2018)

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
by: Huang, Shaofei, et al.
Published: (2024)