Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhu, Yiyao, Xue, Ying, Zhang, Haiming, Jiang, Guangfeng, Zhou, Wending, Yan, Xu, Gao, Jiantao, Cai, Yingjie, Liu, Bingbing, Li, Zhen, Shen, Shaojie
Formato:	Preprint
Publicado:	2026
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2604.00969
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866917377747189760
author	Zhu, Yiyao Xue, Ying Zhang, Haiming Jiang, Guangfeng Zhou, Wending Yan, Xu Gao, Jiantao Cai, Yingjie Liu, Bingbing Li, Zhen Shen, Shaojie
author_facet	Zhu, Yiyao Xue, Ying Zhang, Haiming Jiang, Guangfeng Zhou, Wending Yan, Xu Gao, Jiantao Cai, Yingjie Liu, Bingbing Li, Zhen Shen, Shaojie
contents	Vision-based autonomous driving has gained much attention due to its low costs and excellent performance. Compared with dense BEV (Bird's Eye View) or sparse query models, Gaussian-centric method is a comprehensive yet sparse representation by describing scene with 3D semantic Gaussians. In this paper, we introduce DLWM, a novel paradigm with Dual Latent World Models specifically designed to enable holistic gaussian-centric pre-training in autonomous driving using two stages. In the first stage, DLWM predicts 3D Gaussians from queries by self-supervised reconstructing multi-view semantic and depth images. Equipped with fine-grained contextual features, in the second stage, two latent world models are trained separately for temporal feature learning, including Gaussian-flow-guided latent prediction for downstream occupancy perception and forecasting tasks, and ego-planning-guided latent prediction for motion planning. Extensive experiments in SurroundOcc and nuScenes benchmarks demonstrate that DLWM shows significant performance gains across Gaussian-centric 3D occupancy perception, 4D occupancy forecasting and motion planning tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_00969
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	DLWM: Dual Latent World Models enable Holistic Gaussian-centric Pre-training in Autonomous Driving Zhu, Yiyao Xue, Ying Zhang, Haiming Jiang, Guangfeng Zhou, Wending Yan, Xu Gao, Jiantao Cai, Yingjie Liu, Bingbing Li, Zhen Shen, Shaojie Computer Vision and Pattern Recognition Vision-based autonomous driving has gained much attention due to its low costs and excellent performance. Compared with dense BEV (Bird's Eye View) or sparse query models, Gaussian-centric method is a comprehensive yet sparse representation by describing scene with 3D semantic Gaussians. In this paper, we introduce DLWM, a novel paradigm with Dual Latent World Models specifically designed to enable holistic gaussian-centric pre-training in autonomous driving using two stages. In the first stage, DLWM predicts 3D Gaussians from queries by self-supervised reconstructing multi-view semantic and depth images. Equipped with fine-grained contextual features, in the second stage, two latent world models are trained separately for temporal feature learning, including Gaussian-flow-guided latent prediction for downstream occupancy perception and forecasting tasks, and ego-planning-guided latent prediction for motion planning. Extensive experiments in SurroundOcc and nuScenes benchmarks demonstrate that DLWM shows significant performance gains across Gaussian-centric 3D occupancy perception, 4D occupancy forecasting and motion planning tasks.
title	DLWM: Dual Latent World Models enable Holistic Gaussian-centric Pre-training in Autonomous Driving
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.00969

Ejemplares similares