Saved in:
Bibliographic Details
Main Authors: Merino, Tim, Earle, Sam, Iwai, Ryunosuke, Togelius, Julian, Cetin, Edoardo
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.22847
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914506044604416
author Merino, Tim
Earle, Sam
Iwai, Ryunosuke
Togelius, Julian
Cetin, Edoardo
author_facet Merino, Tim
Earle, Sam
Iwai, Ryunosuke
Togelius, Julian
Cetin, Edoardo
contents We introduce Dream-Cubed, a large-scale dataset of Minecraft worlds at voxel resolution, and a family of models using cubes as powerful compositional units for efficient generation of interactive 3D environments. Dream-Cubed comprises tens of billions of tokens from a carefully curated mixture of procedural biome terrain and high-quality human-authored maps. We use this dataset to conduct the first large-scale study of 3D diffusion models for voxel generation, analyzing discrete and continuous diffusion formulations, data compositions, and architectural design choices. Our models operate directly in the space of blocks, enabling efficient and semantically grounded generation while supporting interactive user workflows such as inpainting and outpainting from user-authored blocks. To quantitatively evaluate our models, we adapt the FID metric to assess semantic differences between real and generated world renderings, and validate generation quality through a human preference study. We release the full dataset, code, and all our pretrained models, which we hope will provide a foundation for future research in efficient generative modeling for structured, interactive 3D environments.
format Preprint
id arxiv_https___arxiv_org_abs_2604_22847
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes
Merino, Tim
Earle, Sam
Iwai, Ryunosuke
Togelius, Julian
Cetin, Edoardo
Computer Vision and Pattern Recognition
We introduce Dream-Cubed, a large-scale dataset of Minecraft worlds at voxel resolution, and a family of models using cubes as powerful compositional units for efficient generation of interactive 3D environments. Dream-Cubed comprises tens of billions of tokens from a carefully curated mixture of procedural biome terrain and high-quality human-authored maps. We use this dataset to conduct the first large-scale study of 3D diffusion models for voxel generation, analyzing discrete and continuous diffusion formulations, data compositions, and architectural design choices. Our models operate directly in the space of blocks, enabling efficient and semantically grounded generation while supporting interactive user workflows such as inpainting and outpainting from user-authored blocks. To quantitatively evaluate our models, we adapt the FID metric to assess semantic differences between real and generated world renderings, and validate generation quality through a human preference study. We release the full dataset, code, and all our pretrained models, which we hope will provide a foundation for future research in efficient generative modeling for structured, interactive 3D environments.
title Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2604.22847