Saved in:
Bibliographic Details
Main Authors: Laguna, Sonia, Garcia-Garcia, Alberto, Rakotosaona, Marie-Julie, Moschoglou, Stylianos, Helminger, Leonhard, Orts-Escolano, Sergio
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.09328
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909578509156352
author Laguna, Sonia
Garcia-Garcia, Alberto
Rakotosaona, Marie-Julie
Moschoglou, Stylianos
Helminger, Leonhard
Orts-Escolano, Sergio
author_facet Laguna, Sonia
Garcia-Garcia, Alberto
Rakotosaona, Marie-Julie
Moschoglou, Stylianos
Helminger, Leonhard
Orts-Escolano, Sergio
contents Modern machine learning models for scene understanding, such as depth estimation and object tracking, rely on large, high-quality datasets that mimic real-world deployment scenarios. To address data scarcity, we propose an end-to-end system for synthetic data generation for scalable, high-quality, and customizable 3D indoor scenes. By integrating and adapting text-to-image and multi-view diffusion models with Neural Radiance Field-based meshing, this system generates highfidelity 3D object assets from text prompts and incorporates them into pre-defined floor plans using a rendering tool. By introducing novel loss functions and training strategies into existing methods, the system supports on-demand scene generation, aiming to alleviate the scarcity of current available data, generally manually crafted by artists. This system advances the role of synthetic data in addressing machine learning training limitations, enabling more robust and generalizable models for real-world applications.
format Preprint
id arxiv_https___arxiv_org_abs_2504_09328
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Text To 3D Object Generation For Scalable Room Assembly
Laguna, Sonia
Garcia-Garcia, Alberto
Rakotosaona, Marie-Julie
Moschoglou, Stylianos
Helminger, Leonhard
Orts-Escolano, Sergio
Computer Vision and Pattern Recognition
Machine Learning
Modern machine learning models for scene understanding, such as depth estimation and object tracking, rely on large, high-quality datasets that mimic real-world deployment scenarios. To address data scarcity, we propose an end-to-end system for synthetic data generation for scalable, high-quality, and customizable 3D indoor scenes. By integrating and adapting text-to-image and multi-view diffusion models with Neural Radiance Field-based meshing, this system generates highfidelity 3D object assets from text prompts and incorporates them into pre-defined floor plans using a rendering tool. By introducing novel loss functions and training strategies into existing methods, the system supports on-demand scene generation, aiming to alleviate the scarcity of current available data, generally manually crafted by artists. This system advances the role of synthetic data in addressing machine learning training limitations, enabling more robust and generalizable models for real-world applications.
title Text To 3D Object Generation For Scalable Room Assembly
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2504.09328