Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mereu, Riccardo, Scannell, Aidan, Hou, Yuxin, Zhao, Yi, Jitta, Aditya, Dominguez, Antonio, Acerbi, Luigi, Storkey, Amos, Chang, Paul
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Robotics
Online Access:	https://arxiv.org/abs/2510.07092
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915539496992768
author	Mereu, Riccardo Scannell, Aidan Hou, Yuxin Zhao, Yi Jitta, Aditya Dominguez, Antonio Acerbi, Luigi Storkey, Amos Chang, Paul
author_facet	Mereu, Riccardo Scannell, Aidan Hou, Yuxin Zhao, Yi Jitta, Aditya Dominguez, Antonio Acerbi, Luigi Storkey, Amos Chang, Paul
contents	World models are a powerful paradigm in AI and robotics, enabling agents to reason about the future by predicting visual observations or compact latent states. The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5B to video-state-conditioned future frame prediction. We condition the video generation on robot states using AdaLN-Zero, and further post-train the model using LoRA. For the compression track, we train a Spatio-Temporal Transformer model from scratch. Our models achieve 23.0 dB PSNR in the sampling task and a Top-500 CE of 6.6386 in the compression task, securing 1st place in both challenges.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_07092
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report Mereu, Riccardo Scannell, Aidan Hou, Yuxin Zhao, Yi Jitta, Aditya Dominguez, Antonio Acerbi, Luigi Storkey, Amos Chang, Paul Machine Learning Artificial Intelligence Robotics World models are a powerful paradigm in AI and robotics, enabling agents to reason about the future by predicting visual observations or compact latent states. The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5B to video-state-conditioned future frame prediction. We condition the video generation on robot states using AdaLN-Zero, and further post-train the model using LoRA. For the compression track, we train a Spatio-Temporal Transformer model from scratch. Our models achieve 23.0 dB PSNR in the sampling task and a Top-500 CE of 6.6386 in the compression task, securing 1st place in both challenges.
title	Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report
topic	Machine Learning Artificial Intelligence Robotics
url	https://arxiv.org/abs/2510.07092

Similar Items