Saved in:
| Main Authors: | Morales, Luis Yoichi, Zanlungo, Francesco, Woollard, David M. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.00928 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MARLIN: A Cloud Integrated Robotic Solution to Support Intralogistics in Retail
by: Mronga, Dennis, et al.
Published: (2024)
by: Mronga, Dennis, et al.
Published: (2024)
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
by: Yang, Yuncong, et al.
Published: (2025)
by: Yang, Yuncong, et al.
Published: (2025)
Break Out the Silverware -- Semantic Understanding of Stored Household Items
by: Levi-Richter, Michaela, et al.
Published: (2025)
by: Levi-Richter, Michaela, et al.
Published: (2025)
PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models
by: Rouhi, Amirreza, et al.
Published: (2026)
by: Rouhi, Amirreza, et al.
Published: (2026)
MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception
by: Hao, Xiaoshuai, et al.
Published: (2025)
by: Hao, Xiaoshuai, et al.
Published: (2025)
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
by: Lohner, Aaron, et al.
Published: (2024)
by: Lohner, Aaron, et al.
Published: (2024)
4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview
by: Kiefer, Benjamin, et al.
Published: (2026)
by: Kiefer, Benjamin, et al.
Published: (2026)
A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision
by: Rohan, Ali, et al.
Published: (2025)
by: Rohan, Ali, et al.
Published: (2025)
A Systematic Literature Review of Computer Vision Applications in Robotized Wire Harness Assembly
by: Wang, Hao, et al.
Published: (2023)
by: Wang, Hao, et al.
Published: (2023)
Overview of Computer Vision Techniques in Robotized Wire Harness Assembly: Current State and Future Opportunities
by: Wang, Hao, et al.
Published: (2023)
by: Wang, Hao, et al.
Published: (2023)
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
by: Zhang, Zhengbo, et al.
Published: (2026)
by: Zhang, Zhengbo, et al.
Published: (2026)
Free-form language-based robotic reasoning and grasping
by: Jiao, Runyu, et al.
Published: (2025)
by: Jiao, Runyu, et al.
Published: (2025)
Obstruction reasoning for robotic grasping
by: Jiao, Runyu, et al.
Published: (2025)
by: Jiao, Runyu, et al.
Published: (2025)
Robot Learning from a Physical World Model
by: Mao, Jiageng, et al.
Published: (2025)
by: Mao, Jiageng, et al.
Published: (2025)
Physically Grounded Vision-Language Models for Robotic Manipulation
by: Gao, Jensen, et al.
Published: (2023)
by: Gao, Jensen, et al.
Published: (2023)
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
by: Patel, Shivansh, et al.
Published: (2025)
by: Patel, Shivansh, et al.
Published: (2025)
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
by: Xu, Xinli, et al.
Published: (2024)
by: Xu, Xinli, et al.
Published: (2024)
Digital Gene: Learning about the Physical World through Analytic Concepts
by: Sun, Jianhua, et al.
Published: (2025)
by: Sun, Jianhua, et al.
Published: (2025)
Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals
by: Gillman, Nate, et al.
Published: (2026)
by: Gillman, Nate, et al.
Published: (2026)
ContactGaussian-WM: Learning Physics-Grounded World Model from Videos
by: Wang, Meizhong, et al.
Published: (2026)
by: Wang, Meizhong, et al.
Published: (2026)
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
by: Zhong, Yiming, et al.
Published: (2025)
by: Zhong, Yiming, et al.
Published: (2025)
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos
by: Jiang, Hanxiao, et al.
Published: (2025)
by: Jiang, Hanxiao, et al.
Published: (2025)
A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring
by: Wang, Wenze, et al.
Published: (2026)
by: Wang, Wenze, et al.
Published: (2026)
PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions
by: Lee, Jihyun, et al.
Published: (2026)
by: Lee, Jihyun, et al.
Published: (2026)
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
by: Li, Shilong, et al.
Published: (2025)
by: Li, Shilong, et al.
Published: (2025)
Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions
by: Alfaro, Marcos, et al.
Published: (2024)
by: Alfaro, Marcos, et al.
Published: (2024)
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning
by: Qi, Xiuxiu, et al.
Published: (2025)
by: Qi, Xiuxiu, et al.
Published: (2025)
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
by: Zhou, Yunsong, et al.
Published: (2026)
by: Zhou, Yunsong, et al.
Published: (2026)
PhyGile: Physics-Prefix Guided Motion Generation for Agile General Humanoid Motion Tracking
by: Bao, Jiacheng, et al.
Published: (2026)
by: Bao, Jiacheng, et al.
Published: (2026)
PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly
by: Ma, Liang, et al.
Published: (2025)
by: Ma, Liang, et al.
Published: (2025)
Multi-Modal World Model for Physical Robot Interactions: Simultaneous Visual and Tactile Predictions for Enhanced Accuracy
by: Mandil, Willow, et al.
Published: (2023)
by: Mandil, Willow, et al.
Published: (2023)
Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
by: Lu, Haoran, et al.
Published: (2026)
by: Lu, Haoran, et al.
Published: (2026)
DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model
by: Azhari, Maulana Bisyir, et al.
Published: (2025)
by: Azhari, Maulana Bisyir, et al.
Published: (2025)
FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
by: Eisner, Ben, et al.
Published: (2022)
by: Eisner, Ben, et al.
Published: (2022)
PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis
by: Yang, Yu, et al.
Published: (2025)
by: Yang, Yu, et al.
Published: (2025)
REALM: An RGB and Event Aligned Latent Manifold for Cross-Modal Perception
by: Polizzi, Vincenzo, et al.
Published: (2026)
by: Polizzi, Vincenzo, et al.
Published: (2026)
Evaluation of Large Language Models for Anomaly Detection in Autonomous Vehicles
by: Loukas, Petros, et al.
Published: (2025)
by: Loukas, Petros, et al.
Published: (2025)
Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields
by: Hausler, Stephen, et al.
Published: (2024)
by: Hausler, Stephen, et al.
Published: (2024)
H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
by: Ci, Hai, et al.
Published: (2025)
by: Ci, Hai, et al.
Published: (2025)
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru
by: Cusipuma, Dunant, et al.
Published: (2025)
by: Cusipuma, Dunant, et al.
Published: (2025)
Similar Items
-
MARLIN: A Cloud Integrated Robotic Solution to Support Intralogistics in Retail
by: Mronga, Dennis, et al.
Published: (2024) -
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
by: Yang, Yuncong, et al.
Published: (2025) -
Break Out the Silverware -- Semantic Understanding of Stored Household Items
by: Levi-Richter, Michaela, et al.
Published: (2025) -
PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models
by: Rouhi, Amirreza, et al.
Published: (2026) -
MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception
by: Hao, Xiaoshuai, et al.
Published: (2025)