:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ye, Angen, Wang, Boyuan, Ni, Chaojun, Huang, Guan, Zhao, Guosheng, Li, Hao, Li, Hengtao, Li, Jie, Lv, Jindi, Liu, Jingyu, Cao, Min, Li, Peng, Deng, Qiuping, Mei, Wenjun, Wang, Xiaofeng, Chen, Xinze, Zhou, Xinyu, Wang, Yang, Chang, Yifan, Li, Yifan, Zhou, Yukun, Ye, Yun, Liu, Zhichao, Zhu, Zheng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.17240
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GigaWorld-0: World Models as Data Engine to Empower Embodied AI
by: GigaWorld Team, et al.
Published: (2025)

GigaBrain-0: A World Model-Powered Vision-Language-Action Model
by: GigaBrain Team, et al.
Published: (2025)

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
by: GigaBrain Team, et al.
Published: (2026)

ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
by: Wang, Boyuan, et al.
Published: (2026)

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)

EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling
by: Wang, Boyuan, et al.
Published: (2025)

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
by: Zhao, Guosheng, et al.
Published: (2024)

WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
by: Ni, Chaojun, et al.
Published: (2025)

ViVa: A Video-Generative Value Model for Robot Reinforcement Learning
by: Lv, Jindi, et al.
Published: (2026)

EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer
by: Dong, Zhehao, et al.
Published: (2025)

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
by: Wang, Boyuan, et al.
Published: (2025)

WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration
by: Ni, Chaojun, et al.
Published: (2025)

VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis
by: Lang, Xiaolei, et al.
Published: (2026)

GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning
by: Bao, Xiaoyi, et al.
Published: (2025)

ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models
by: Ye, Wencheng, et al.
Published: (2025)

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
by: Wang, Xiaofeng, et al.
Published: (2024)

ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction
by: Ni, Chaojun, et al.
Published: (2025)

SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead
by: Ni, Chaojun, et al.
Published: (2025)

Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation
by: Niu, Ye, et al.
Published: (2025)

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
by: Zhao, Guosheng, et al.
Published: (2024)

WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
by: Wang, Qisen, et al.
Published: (2026)

Can Structured Templates Facilitate LLMs in Tackling Harder Tasks? : An Exploration of Scaling Laws by Difficulty
by: Yang, Zhichao, et al.
Published: (2025)

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning
by: Zhou, Yang, et al.
Published: (2026)

From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction
by: Zhao, Zhida, et al.
Published: (2025)

IC-World: In-Context Generation for Shared World Modeling
by: Wu, Fan, et al.
Published: (2025)

RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
by: Li, Hengtao, et al.
Published: (2025)

World Models as Group Actions
by: Wang, Zijie, et al.
Published: (2026)

Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
by: Li, Yuhan, et al.
Published: (2025)

Co-Evolving Latent Action World Models
by: Wang, Yucen, et al.
Published: (2025)

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
by: Ni, Chaojun, et al.
Published: (2024)

HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration
by: Wang, Boyuan, et al.
Published: (2025)

MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training
by: Li, Haoyun, et al.
Published: (2025)

DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion
by: Wang, Weijie, et al.
Published: (2025)

MAPF-World: Action World Model for Multi-Agent Path Finding
by: Yang, Zhanjiang, et al.
Published: (2025)

OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
by: Liu, Yushan, et al.
Published: (2026)

ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
by: Zhou, Yuhao, et al.
Published: (2026)

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
by: Zhao, Guosheng, et al.
Published: (2025)

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising
by: Guo, Jun, et al.
Published: (2026)

The DAWN of World-Action Interactive Models
by: Lu, Hongbo, et al.
Published: (2026)