Saved in:
| Main Authors: | Shen, Yiqing, Li, Chenjia, Fan, Chenxiao, Unberath, Mathias |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.16718 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RVTBench: A Benchmark for Visual Reasoning Tasks
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Reasoning Text-to-Video Retrieval via Digital Twin Video Representations and Large Language Models
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Fast Reasoning Segmentation for Images and Videos
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Counterfactual World Models via Digital Twin-conditioned Video Diffusion
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Reasoning Segmentation for Images and Videos: A Survey
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer
by: Shao, Xinyuan, et al.
Published: (2024)
by: Shao, Xinyuan, et al.
Published: (2024)
MoSFormer: Augmenting Temporal Context with Memory of Surgery for Surgical Phase Recognition
by: Ding, Hao, et al.
Published: (2025)
by: Ding, Hao, et al.
Published: (2025)
FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images
by: Shen, Yiqing, et al.
Published: (2024)
by: Shen, Yiqing, et al.
Published: (2024)
A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness
by: Drenkow, Nathan, et al.
Published: (2025)
by: Drenkow, Nathan, et al.
Published: (2025)
Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations
by: Li, Yizhen, et al.
Published: (2025)
by: Li, Yizhen, et al.
Published: (2025)
From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments
by: Oguine, Kanyifeechukwu J., et al.
Published: (2024)
by: Oguine, Kanyifeechukwu J., et al.
Published: (2024)
Hyperspectral Image Recovery Constrained by Multi-Granularity Non-Local Self-Similarity Priors
by: Peng, Zhuoran, et al.
Published: (2025)
by: Peng, Zhuoran, et al.
Published: (2025)
TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research
by: Zhang, Han, et al.
Published: (2025)
by: Zhang, Han, et al.
Published: (2025)
An Intrinsically Explainable Approach to Detecting Vertebral Compression Fractures in CT Scans via Neurosymbolic Modeling
by: Inigo, Blanca, et al.
Published: (2024)
by: Inigo, Blanca, et al.
Published: (2024)
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)
by: Cheng, Zixu, et al.
Published: (2025)
Spatio-Temporal Attention for Consistent Video Semantic Segmentation in Automated Driving
by: Varghese, Serin, et al.
Published: (2026)
by: Varghese, Serin, et al.
Published: (2026)
VIRST: Video-Instructed Reasoning Assistant for SpatioTemporal Segmentation
by: Hong, Jihwan, et al.
Published: (2026)
by: Hong, Jihwan, et al.
Published: (2026)
Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning
by: Sugandhika, Chinthani, et al.
Published: (2025)
by: Sugandhika, Chinthani, et al.
Published: (2025)
AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction
by: Maksutova, Aiza, et al.
Published: (2026)
by: Maksutova, Aiza, et al.
Published: (2026)
Temporal-Conditional Referring Video Object Segmentation with Noise-Free Text-to-Video Diffusion Model
by: Zhang, Ruixin, et al.
Published: (2025)
by: Zhang, Ruixin, et al.
Published: (2025)
HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning
by: Tao, Sicheng, et al.
Published: (2025)
by: Tao, Sicheng, et al.
Published: (2025)
TemporalVLM: Video LLMs for Temporal Reasoning in Long Videos
by: Fateh, Fawad Javed, et al.
Published: (2024)
by: Fateh, Fawad Javed, et al.
Published: (2024)
Human-AI Collaboration and Explainability for 2D/3D Registration Quality Assurance
by: Cho, Sue Min, et al.
Published: (2025)
by: Cho, Sue Min, et al.
Published: (2025)
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
by: Gong, Sitong, et al.
Published: (2025)
by: Gong, Sitong, et al.
Published: (2025)
Training-Free Spatio-temporal Decoupled Reasoning Video Segmentation with Adaptive Object Memory
by: Zhu, Zhengtong, et al.
Published: (2026)
by: Zhu, Zhengtong, et al.
Published: (2026)
Towards Controllable Video Synthesis of Routine and Rare OR Events
by: Schneider, Dominik, et al.
Published: (2026)
by: Schneider, Dominik, et al.
Published: (2026)
Online Reasoning Video Object Segmentation
by: Liu, Jinyuan, et al.
Published: (2026)
by: Liu, Jinyuan, et al.
Published: (2026)
Causality-Driven Audits of Model Robustness
by: Drenkow, Nathan, et al.
Published: (2024)
by: Drenkow, Nathan, et al.
Published: (2024)
Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing
by: Wang, Yijia, et al.
Published: (2025)
by: Wang, Yijia, et al.
Published: (2025)
Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs
by: Zhu, Rui, et al.
Published: (2026)
by: Zhu, Rui, et al.
Published: (2026)
Calisthenics Skills Temporal Video Segmentation
by: Finocchiaro, Antonio, et al.
Published: (2025)
by: Finocchiaro, Antonio, et al.
Published: (2025)
Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction
by: Fan, Chenyou, et al.
Published: (2025)
by: Fan, Chenyou, et al.
Published: (2025)
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving
by: Wang, Yiting, et al.
Published: (2024)
by: Wang, Yiting, et al.
Published: (2024)
Video-based Sign Language Recognition without Temporal Segmentation
by: Huang, Jie, et al.
Published: (2018)
by: Huang, Jie, et al.
Published: (2018)
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
by: Huang, Shaofei, et al.
Published: (2024)
by: Huang, Shaofei, et al.
Published: (2024)
Similar Items
-
RVTBench: A Benchmark for Visual Reasoning Tasks
by: Shen, Yiqing, et al.
Published: (2025) -
Reasoning Text-to-Video Retrieval via Digital Twin Video Representations and Large Language Models
by: Shen, Yiqing, et al.
Published: (2025) -
Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations
by: Shen, Yiqing, et al.
Published: (2025) -
Fast Reasoning Segmentation for Images and Videos
by: Shen, Yiqing, et al.
Published: (2025) -
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
by: Shen, Yiqing, et al.
Published: (2025)