Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jiang, Jialong, Hu, Wenkang, Huang, Jian, Jiao, Yuling, Liu, Xu
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Applications
Online Access:	https://arxiv.org/abs/2505.04992
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910932670611456
author	Jiang, Jialong Hu, Wenkang Huang, Jian Jiao, Yuling Liu, Xu
author_facet	Jiang, Jialong Hu, Wenkang Huang, Jian Jiao, Yuling Liu, Xu
contents	The rapid advancement of generative models, such as Stable Diffusion, raises a key question: how can synthetic data from these models enhance predictive modeling? While they can generate vast amounts of datasets, only a subset meaningfully improves performance. We propose a novel end-to-end framework that generates and systematically filters synthetic data through domain-specific statistical methods, selectively integrating high-quality samples for effective augmentation. Our experiments demonstrate consistent improvements in predictive performance across various settings, highlighting the potential of our framework while underscoring the inherent limitations of generative models for data augmentation. Despite the ability to produce large volumes of synthetic data, the proportion that effectively improves model performance is limited.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_04992
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Boosting Statistic Learning with Synthetic Data from Pretrained Large Models Jiang, Jialong Hu, Wenkang Huang, Jian Jiao, Yuling Liu, Xu Machine Learning Applications The rapid advancement of generative models, such as Stable Diffusion, raises a key question: how can synthetic data from these models enhance predictive modeling? While they can generate vast amounts of datasets, only a subset meaningfully improves performance. We propose a novel end-to-end framework that generates and systematically filters synthetic data through domain-specific statistical methods, selectively integrating high-quality samples for effective augmentation. Our experiments demonstrate consistent improvements in predictive performance across various settings, highlighting the potential of our framework while underscoring the inherent limitations of generative models for data augmentation. Despite the ability to produce large volumes of synthetic data, the proportion that effectively improves model performance is limited.
title	Boosting Statistic Learning with Synthetic Data from Pretrained Large Models
topic	Machine Learning Applications
url	https://arxiv.org/abs/2505.04992

Similar Items