Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Vu, Minh H., Edler, Daniel, Wibom, Carl, Löfstedt, Tommy, Melin, Beatrice, Rosvall, Martin
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2405.16971
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914528290144256
author	Vu, Minh H. Edler, Daniel Wibom, Carl Löfstedt, Tommy Melin, Beatrice Rosvall, Martin
author_facet	Vu, Minh H. Edler, Daniel Wibom, Carl Löfstedt, Tommy Melin, Beatrice Rosvall, Martin
contents	Deep learning (DL) models require extensive data to achieve strong performance and generalization. Deep generative models (DGMs) offer a solution by synthesizing data. Yet current approaches for tabular data often fail to preserve feature correlations and distributions during training, struggle with multi-metric hyperparameter selection, and lack comprehensive evaluation protocols. We address this gap with a unified framework that integrates training, hyperparameter tuning, and evaluation. First, we introduce a novel correlation- and distribution-aware loss function that regularizes DGMs, enhancing their ability to generate synthetic tabular data that faithfully represents the underlying data distributions. Theoretical analysis establishes stability and consistency guarantees. To enable principled hyperparameter search via Bayesian optimization (BO), we also propose a new multi-objective aggregation strategy based on iterative objective refinement Bayesian optimization (IORBO), along with a comprehensive statistical testing framework. We validate the proposed approach using a benchmarking framework with twenty real-world datasets and ten established tabular DGM baselines. The correlation-aware loss function significantly improves synthetic data fidelity and downstream machine learning (ML) performance, while IORBO consistently outperforms standard Bayesian optimization (SBO) in hyperparameter selection. The unified framework advances tabular generative modeling beyond isolated method improvements. Code is available at: https://github.com/vuhoangminh/TabGen-Framework
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_16971
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A Unified Framework for Tabular Generative Modeling: Loss Functions, Benchmarks, and Improved Multi-objective Bayesian Optimization Approaches Vu, Minh H. Edler, Daniel Wibom, Carl Löfstedt, Tommy Melin, Beatrice Rosvall, Martin Machine Learning Deep learning (DL) models require extensive data to achieve strong performance and generalization. Deep generative models (DGMs) offer a solution by synthesizing data. Yet current approaches for tabular data often fail to preserve feature correlations and distributions during training, struggle with multi-metric hyperparameter selection, and lack comprehensive evaluation protocols. We address this gap with a unified framework that integrates training, hyperparameter tuning, and evaluation. First, we introduce a novel correlation- and distribution-aware loss function that regularizes DGMs, enhancing their ability to generate synthetic tabular data that faithfully represents the underlying data distributions. Theoretical analysis establishes stability and consistency guarantees. To enable principled hyperparameter search via Bayesian optimization (BO), we also propose a new multi-objective aggregation strategy based on iterative objective refinement Bayesian optimization (IORBO), along with a comprehensive statistical testing framework. We validate the proposed approach using a benchmarking framework with twenty real-world datasets and ten established tabular DGM baselines. The correlation-aware loss function significantly improves synthetic data fidelity and downstream machine learning (ML) performance, while IORBO consistently outperforms standard Bayesian optimization (SBO) in hyperparameter selection. The unified framework advances tabular generative modeling beyond isolated method improvements. Code is available at: https://github.com/vuhoangminh/TabGen-Framework
title	A Unified Framework for Tabular Generative Modeling: Loss Functions, Benchmarks, and Improved Multi-objective Bayesian Optimization Approaches
topic	Machine Learning
url	https://arxiv.org/abs/2405.16971

Similar Items