Furkejuvvon:
| Váldodahkkit: | , , , , , , , , , , |
|---|---|
| Materiálatiipa: | Preprint |
| Almmustuhtton: |
2025
|
| Fáttát: | |
| Liŋkkat: | https://arxiv.org/abs/2509.23402 |
| Fáddágilkorat: |
Lasit fáddágilkoriid
Eai fáddágilkorat, Lasit vuosttaš fáddágilkora!
|
Sisdoallologahallan:
- Recent advances in driving-scene generation and reconstruction have demonstrated significant potential for enhancing autonomous driving systems by producing scalable and controllable training data. Existing generation methods primarily focus on synthesizing diverse and high-fidelity driving videos; however, due to limited 3D consistency and sparse viewpoint coverage, they struggle to support convenient and high-quality novel-view synthesis (NVS). Conversely, recent 3D/4D reconstruction approaches have significantly improved NVS for real-world driving scenes, yet inherently lack generative capabilities. To overcome this dilemma between scene generation and reconstruction, we propose WorldSplat, a novel feed-forward framework for 4D driving-scene generation. Our approach effectively generates consistent multi-track videos through two key steps: (i) We introduce a 4D-aware latent diffusion model integrating multi-modal information to produce pixel-aligned 4D Gaussians in a feed-forward manner. (ii) Subsequently, we refine the novel view videos rendered from these Gaussians using a enhanced video diffusion model. Extensive experiments conducted on benchmark datasets demonstrate that WorldSplat effectively generates high-fidelity, temporally and spatially consistent multi-track novel view driving videos. Project: https://wm-research.github.io/worldsplat/