Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Zeng, Linda
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2409.00071
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911230195662848
author	Zeng, Linda
author_facet	Zeng, Linda
contents	Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_00071
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation Zeng, Linda Computation and Language Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.
title	Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation
topic	Computation and Language
url	https://arxiv.org/abs/2409.00071

Similar Items