MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Schofield, Hunter, Elmahgiubi, Mohammed, Rezaee, Kasra, Shan, Jinjun
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Robotics
Accesso online:	https://arxiv.org/abs/2508.01922
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866916878966849536
author	Schofield, Hunter Elmahgiubi, Mohammed Rezaee, Kasra Shan, Jinjun
author_facet	Schofield, Hunter Elmahgiubi, Mohammed Rezaee, Kasra Shan, Jinjun
contents	World models have become increasingly popular in acting as learned traffic simulators. Recent work has explored replacing traditional traffic simulators with world models for policy training. In this work, we explore the robustness of existing metrics to evaluate world models as traffic simulators to see if the same metrics are suitable for evaluating a world model as a pseudo-environment for policy training. Specifically, we analyze the metametric employed by the Waymo Open Sim-Agents Challenge (WOSAC) and compare world model predictions on standard scenarios where the agents are fully or partially controlled by the world model (partial replay). Furthermore, since we are interested in evaluating the ego action-conditioned world model, we extend the standard WOSAC evaluation domain to include agents that are causal to the ego vehicle. Our evaluations reveal a significant number of scenarios where top-ranking models perform well under no perturbation but fail when the ego agent is forced to replay the original trajectory. To address these cases, we propose new metrics to highlight the sensitivity of world models to uncontrollable objects and evaluate the performance of world models as pseudo-environments for policy training and analyze some state-of-the-art world models under these new metrics.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_01922
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving Schofield, Hunter Elmahgiubi, Mohammed Rezaee, Kasra Shan, Jinjun Robotics World models have become increasingly popular in acting as learned traffic simulators. Recent work has explored replacing traditional traffic simulators with world models for policy training. In this work, we explore the robustness of existing metrics to evaluate world models as traffic simulators to see if the same metrics are suitable for evaluating a world model as a pseudo-environment for policy training. Specifically, we analyze the metametric employed by the Waymo Open Sim-Agents Challenge (WOSAC) and compare world model predictions on standard scenarios where the agents are fully or partially controlled by the world model (partial replay). Furthermore, since we are interested in evaluating the ego action-conditioned world model, we extend the standard WOSAC evaluation domain to include agents that are causal to the ego vehicle. Our evaluations reveal a significant number of scenarios where top-ranking models perform well under no perturbation but fail when the ego agent is forced to replay the original trajectory. To address these cases, we propose new metrics to highlight the sensitivity of world models to uncontrollable objects and evaluate the performance of world models as pseudo-environments for policy training and analyze some state-of-the-art world models under these new metrics.
title	Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving
topic	Robotics
url	https://arxiv.org/abs/2508.01922

Documenti analoghi