:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Meizhong, Jin, Wanxin, Cao, Kun, Xie, Lihua, Hong, Yiguang
Format:	Preprint
Published:	2026
Subjects:	Robotics Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.11021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields
by: Yang, Zhaoyang, et al.
Published: (2026)

Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout
by: Chi, Haozhuang, et al.
Published: (2026)

PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis
by: Yang, Yu, et al.
Published: (2025)

Robot Learning from a Physical World Model
by: Mao, Jiageng, et al.
Published: (2025)

World Models for Learning Dexterous Hand-Object Interactions from Human Videos
by: Goswami, Raktim Gautam, et al.
Published: (2025)

World Simulation with Video Foundation Models for Physical AI
by: NVIDIA, et al.
Published: (2025)

Rethinking Video Generation Model for the Embodied World
by: Deng, Yufan, et al.
Published: (2026)

Physically Grounded Vision-Language Models for Robotic Manipulation
by: Gao, Jensen, et al.
Published: (2023)

H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
by: Ci, Hai, et al.
Published: (2025)

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning
by: Zhou, Yang, et al.
Published: (2026)

Ego-Grounding for Personalized Question-Answering in Egocentric Videos
by: Xiao, Junbin, et al.
Published: (2026)

UNIC: Learning Unified Multimodal Extrinsic Contact Estimation
by: Xu, Zhengtong, et al.
Published: (2026)

ICAT: Incident-Case-Grounded Adaptive Testing for Physical-Risk Prediction in Embodied World Models
by: Lai, Zhenglin, et al.
Published: (2026)

ContactHandover: Contact-Guided Robot-to-Human Object Handover
by: Wang, Zixi, et al.
Published: (2024)

Mirage2Matter: A Physically Grounded Gaussian World Model from Video
by: Gao, Zhengqing, et al.
Published: (2026)

One-Shot Manipulation Strategy Learning by Making Contact Analogies
by: Liu, Yuyao, et al.
Published: (2024)

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
by: Zhou, Yang, et al.
Published: (2026)

World Models That Know When They Don't Know - Controllable Video Generation with Calibrated Uncertainty
by: Mei, Zhiting, et al.
Published: (2025)

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising
by: Guo, Jun, et al.
Published: (2026)

Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
by: Fu, Ao, et al.
Published: (2024)

Digital Gene: Learning about the Physical World through Analytic Concepts
by: Sun, Jianhua, et al.
Published: (2025)

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
by: Xu, Xinli, et al.
Published: (2024)

DDP-WM: Disentangled Dynamics Prediction for Efficient World Models
by: Yin, Shicheng, et al.
Published: (2026)

Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM
by: Pham, Phu, et al.
Published: (2024)

Grounding Video Models to Actions through Goal Conditioned Exploration
by: Luo, Yunhao, et al.
Published: (2024)

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
by: Lu, Guanxing, et al.
Published: (2025)

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals
by: Gillman, Nate, et al.
Published: (2026)

Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning
by: Qi, Xiuxiu, et al.
Published: (2025)

GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment
by: He, Haoyang, et al.
Published: (2025)

Chain of World: World Model Thinking in Latent Motion
by: Yang, Fuxiang, et al.
Published: (2026)

Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households
by: Cao, Zhihao, et al.
Published: (2024)

Multi-Modal World Model for Physical Robot Interactions: Simultaneous Visual and Tactile Predictions for Enhanced Accuracy
by: Mandil, Willow, et al.
Published: (2023)

PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
by: Huang, Wenlong, et al.
Published: (2026)

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
by: Zeng, Tianle, et al.
Published: (2026)

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
by: Lu, Haoran, et al.
Published: (2026)

WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
by: Chen, Hongjin, et al.
Published: (2026)

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow
by: Dharmarajan, Karthik, et al.
Published: (2025)

AdaWorld: Learning Adaptable World Models with Latent Actions
by: Gao, Shenyuan, et al.
Published: (2025)

MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats
by: Yuan, Shenghai, et al.
Published: (2024)

Learning 3D-Gaussian Simulators from RGB Videos
by: Zhobro, Mikel, et al.
Published: (2025)