Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Zhuowei, Wang, Lianxi, Wu, Yuben, Liao, Xinfeng, Tian, Yujia, Zhong, Junyang
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.03203
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913513375531008
author	Chen, Zhuowei Wang, Lianxi Wu, Yuben Liao, Xinfeng Tian, Yujia Zhong, Junyang
author_facet	Chen, Zhuowei Wang, Lianxi Wu, Yuben Liao, Xinfeng Tian, Yujia Zhong, Junyang
contents	Sentiment classification (SC) often suffers from low-resource challenges such as domain-specific contexts, imbalanced label distributions, and few-shot scenarios. The potential of the diffusion language model (LM) for textual data augmentation (DA) remains unexplored, moreover, textual DA methods struggle to balance the diversity and consistency of new samples. Most DA methods either perform logical modifications or rephrase less important tokens in the original sequence with the language model. In the context of SC, strong emotional tokens could act critically on the sentiment of the whole sequence. Therefore, contrary to rephrasing less important context, we propose DiffusionCLS to leverage a diffusion LM to capture in-domain knowledge and generate pseudo samples by reconstructing strong label-related tokens. This approach ensures a balance between consistency and diversity, avoiding the introduction of noise and augmenting crucial features of datasets. DiffusionCLS also comprises a Noise-Resistant Training objective to help the model generalize. Experiments demonstrate the effectiveness of our method in various low-resource scenarios including domain-specific and domain-general problems. Ablation studies confirm the effectiveness of our framework's modules, and visualization studies highlight optimal deployment conditions, reinforcing our conclusions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_03203
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	An Effective Deployment of Diffusion LM for Data Augmentation in Low-Resource Sentiment Classification Chen, Zhuowei Wang, Lianxi Wu, Yuben Liao, Xinfeng Tian, Yujia Zhong, Junyang Computation and Language Artificial Intelligence Sentiment classification (SC) often suffers from low-resource challenges such as domain-specific contexts, imbalanced label distributions, and few-shot scenarios. The potential of the diffusion language model (LM) for textual data augmentation (DA) remains unexplored, moreover, textual DA methods struggle to balance the diversity and consistency of new samples. Most DA methods either perform logical modifications or rephrase less important tokens in the original sequence with the language model. In the context of SC, strong emotional tokens could act critically on the sentiment of the whole sequence. Therefore, contrary to rephrasing less important context, we propose DiffusionCLS to leverage a diffusion LM to capture in-domain knowledge and generate pseudo samples by reconstructing strong label-related tokens. This approach ensures a balance between consistency and diversity, avoiding the introduction of noise and augmenting crucial features of datasets. DiffusionCLS also comprises a Noise-Resistant Training objective to help the model generalize. Experiments demonstrate the effectiveness of our method in various low-resource scenarios including domain-specific and domain-general problems. Ablation studies confirm the effectiveness of our framework's modules, and visualization studies highlight optimal deployment conditions, reinforcing our conclusions.
title	An Effective Deployment of Diffusion LM for Data Augmentation in Low-Resource Sentiment Classification
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2409.03203

Similar Items