Saved in:
Bibliographic Details
Main Authors: Zhong, Qihuang, Li, Haiyun, Zhuang, Luyao, Liu, Juhua, Du, Bo
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.00341
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929521517658112
author Zhong, Qihuang
Li, Haiyun
Zhuang, Luyao
Liu, Juhua
Du, Bo
author_facet Zhong, Qihuang
Li, Haiyun
Zhuang, Luyao
Liu, Juhua
Du, Bo
contents Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data generation (DG) has become the standard for improving the performance of ABSA. However, current DG methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diversity of generated data, and 3) reliance on some existing labeled data, hindering its applications in real-world scenarios. With the advancement of large language models (LLMs), LLM-based DG has the potential to solve the above issues. Unfortunately, directly prompting LLMs struggles to generate the desired pseudo-label ABSA data, as LLMs are prone to hallucinations, leading to undesired data generation. To this end, we propose a systematic Iterative Data Generation framework, namely IDG, to boost the performance of ABSA. The core of IDG is to make full use of the powerful abilities (i.e., instruction-following, in-context learning and self-reflection) of LLMs to iteratively generate more fluent and diverse pseudo-label data, starting from an unsupervised sentence corpus. Specifically, IDG designs a novel iterative data generation mechanism and a self-reflection data filtering module to tackle the challenges of unexpected data generation caused by hallucinations. Extensive experiments on four widely-used ABSA benchmarks show that IDG brings consistent and significant performance gains among five baseline ABSA models. More encouragingly, the synthetic data generated by IDG can achieve comparable or even better performance against the manually annotated data.
format Preprint
id arxiv_https___arxiv_org_abs_2407_00341
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Iterative Data Generation with Large Language Models for Aspect-based Sentiment Analysis
Zhong, Qihuang
Li, Haiyun
Zhuang, Luyao
Liu, Juhua
Du, Bo
Computation and Language
Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data generation (DG) has become the standard for improving the performance of ABSA. However, current DG methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diversity of generated data, and 3) reliance on some existing labeled data, hindering its applications in real-world scenarios. With the advancement of large language models (LLMs), LLM-based DG has the potential to solve the above issues. Unfortunately, directly prompting LLMs struggles to generate the desired pseudo-label ABSA data, as LLMs are prone to hallucinations, leading to undesired data generation. To this end, we propose a systematic Iterative Data Generation framework, namely IDG, to boost the performance of ABSA. The core of IDG is to make full use of the powerful abilities (i.e., instruction-following, in-context learning and self-reflection) of LLMs to iteratively generate more fluent and diverse pseudo-label data, starting from an unsupervised sentence corpus. Specifically, IDG designs a novel iterative data generation mechanism and a self-reflection data filtering module to tackle the challenges of unexpected data generation caused by hallucinations. Extensive experiments on four widely-used ABSA benchmarks show that IDG brings consistent and significant performance gains among five baseline ABSA models. More encouragingly, the synthetic data generated by IDG can achieve comparable or even better performance against the manually annotated data.
title Iterative Data Generation with Large Language Models for Aspect-based Sentiment Analysis
topic Computation and Language
url https://arxiv.org/abs/2407.00341