:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Morales, Luis Yoichi, Zanlungo, Francesco, Woollard, David M.
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Robotics
Online Access:	https://arxiv.org/abs/2601.00928
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MARLIN: A Cloud Integrated Robotic Solution to Support Intralogistics in Retail
by: Mronga, Dennis, et al.
Published: (2024)

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
by: Yang, Yuncong, et al.
Published: (2025)

Break Out the Silverware -- Semantic Understanding of Stored Household Items
by: Levi-Richter, Michaela, et al.
Published: (2025)

PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models
by: Rouhi, Amirreza, et al.
Published: (2026)

MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception
by: Hao, Xiaoshuai, et al.
Published: (2025)

Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
by: Lohner, Aaron, et al.
Published: (2024)

4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview
by: Kiefer, Benjamin, et al.
Published: (2026)

A Systematic Literature Review on Deep Learning-based Depth Estimation in Computer Vision
by: Rohan, Ali, et al.
Published: (2025)

A Systematic Literature Review of Computer Vision Applications in Robotized Wire Harness Assembly
by: Wang, Hao, et al.
Published: (2023)

Overview of Computer Vision Techniques in Robotized Wire Harness Assembly: Current State and Future Opportunities
by: Wang, Hao, et al.
Published: (2023)

VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
by: Zhang, Zhengbo, et al.
Published: (2026)

Free-form language-based robotic reasoning and grasping
by: Jiao, Runyu, et al.
Published: (2025)

Obstruction reasoning for robotic grasping
by: Jiao, Runyu, et al.
Published: (2025)

Robot Learning from a Physical World Model
by: Mao, Jiageng, et al.
Published: (2025)

Physically Grounded Vision-Language Models for Robotic Manipulation
by: Gao, Jensen, et al.
Published: (2023)

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
by: Patel, Shivansh, et al.
Published: (2025)

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
by: Xu, Xinli, et al.
Published: (2024)

Digital Gene: Learning about the Physical World through Analytic Concepts
by: Sun, Jianhua, et al.
Published: (2025)

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals
by: Gillman, Nate, et al.
Published: (2026)

ContactGaussian-WM: Learning Physics-Grounded World Model from Videos
by: Wang, Meizhong, et al.
Published: (2026)

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
by: Zhong, Yiming, et al.
Published: (2025)

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos
by: Jiang, Hanxiao, et al.
Published: (2025)

A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring
by: Wang, Wenze, et al.
Published: (2026)

PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions
by: Lee, Jihyun, et al.
Published: (2026)

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
by: Li, Shilong, et al.
Published: (2025)

Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions
by: Alfaro, Marcos, et al.
Published: (2024)

Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning
by: Qi, Xiuxiu, et al.
Published: (2025)

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
by: Zhou, Yunsong, et al.
Published: (2026)

PhyGile: Physics-Prefix Guided Motion Generation for Agile General Humanoid Motion Tracking
by: Bao, Jiacheng, et al.
Published: (2026)

PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly
by: Ma, Liang, et al.
Published: (2025)

Multi-Modal World Model for Physical Robot Interactions: Simultaneous Visual and Tactile Predictions for Enhanced Accuracy
by: Mandil, Willow, et al.
Published: (2023)

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
by: Lu, Haoran, et al.
Published: (2026)

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model
by: Azhari, Maulana Bisyir, et al.
Published: (2025)

FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
by: Eisner, Ben, et al.
Published: (2022)

PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis
by: Yang, Yu, et al.
Published: (2025)

REALM: An RGB and Event Aligned Latent Manifold for Cross-Modal Perception
by: Polizzi, Vincenzo, et al.
Published: (2026)

Evaluation of Large Language Models for Anomaly Detection in Autonomous Vehicles
by: Loukas, Petros, et al.
Published: (2025)

Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields
by: Hausler, Stephen, et al.
Published: (2024)

H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
by: Ci, Hai, et al.
Published: (2025)

Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru
by: Cusipuma, Dunant, et al.
Published: (2025)