Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yang, Shuo, Yuan, Chenchen, Rong, Yao, Steinbauer, Felix, Kasneci, Gjergji
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2406.11391
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866916625176854528
author Yang, Shuo
Yuan, Chenchen
Rong, Yao
Steinbauer, Felix
Kasneci, Gjergji
author_facet Yang, Shuo
Yuan, Chenchen
Rong, Yao
Steinbauer, Felix
Kasneci, Gjergji
contents A multitude of industries depend on accurate and reasonable tabular data augmentation for their business processes. Contemporary methodologies in generating tabular data revolve around utilizing Generative Adversarial Networks (GAN) or fine-tuning Large Language Models (LLM). However, GAN-based approaches are documented to produce samples with common-sense errors attributed to the absence of external knowledge. On the other hand, LLM-based methods exhibit a limited capacity to capture the disparities between synthesized and actual data distribution due to the absence of feedback from a discriminator during training. Furthermore, the decoding of LLM-based generation introduces gradient breakpoints, impeding the backpropagation of loss from a discriminator, thereby complicating the integration of these two approaches. To solve this challenge, we propose using proximal policy optimization (PPO) to apply GANs, guiding LLMs to enhance the probability distribution of tabular features. This approach enables the utilization of LLMs as generators for GANs in synthesizing tabular data. Our experiments demonstrate that PPO leads to an approximately 4\% improvement in the accuracy of models trained on synthetically generated data over state-of-the-art across three real-world datasets.
format Preprint
id arxiv_https___arxiv_org_abs_2406_11391
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models
Yang, Shuo
Yuan, Chenchen
Rong, Yao
Steinbauer, Felix
Kasneci, Gjergji
Machine Learning
A multitude of industries depend on accurate and reasonable tabular data augmentation for their business processes. Contemporary methodologies in generating tabular data revolve around utilizing Generative Adversarial Networks (GAN) or fine-tuning Large Language Models (LLM). However, GAN-based approaches are documented to produce samples with common-sense errors attributed to the absence of external knowledge. On the other hand, LLM-based methods exhibit a limited capacity to capture the disparities between synthesized and actual data distribution due to the absence of feedback from a discriminator during training. Furthermore, the decoding of LLM-based generation introduces gradient breakpoints, impeding the backpropagation of loss from a discriminator, thereby complicating the integration of these two approaches. To solve this challenge, we propose using proximal policy optimization (PPO) to apply GANs, guiding LLMs to enhance the probability distribution of tabular features. This approach enables the utilization of LLMs as generators for GANs in synthesizing tabular data. Our experiments demonstrate that PPO leads to an approximately 4\% improvement in the accuracy of models trained on synthetically generated data over state-of-the-art across three real-world datasets.
title P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models
topic Machine Learning
url https://arxiv.org/abs/2406.11391