Saved in:
Bibliographic Details
Main Authors: Ren, Yi, Zhang, Tianyi, Li, Weibin, Zhou, DuoMu, Qin, Chenhao, Dong, FangCheng
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.18548
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912048011542528
author Ren, Yi
Zhang, Tianyi
Li, Weibin
Zhou, DuoMu
Qin, Chenhao
Dong, FangCheng
author_facet Ren, Yi
Zhang, Tianyi
Li, Weibin
Zhou, DuoMu
Qin, Chenhao
Dong, FangCheng
contents In recent years, with the rapid development of large language models, serval models such as GPT-4o have demonstrated extraordinary capabilities, surpassing human performance in various language tasks. As a result, many researchers have begun exploring their potential applications in the field of public opinion analysis. This study proposes a novel large-language-models-based method for public opinion event heat level prediction. First, we preprocessed and classified 62,836 Chinese hot event data collected between July 2022 and December 2023. Then, based on each event's online dissemination heat index, we used the MiniBatchKMeans algorithm to automatically cluster the events and categorize them into four heat levels (ranging from low heat to very high heat). Next, we randomly selected 250 events from each heat level, totalling 1,000 events, to build the evaluation dataset. During the evaluation process, we employed various large language models to assess their accuracy in predicting event heat levels in two scenarios: without reference cases and with similar case references. The results showed that GPT-4o and DeepseekV2 performed the best in the latter case, achieving prediction accuracies of 41.4% and 41.5%, respectively. Although the overall prediction accuracy remains relatively low, it is worth noting that for low-heat (Level 1) events, the prediction accuracies of these two models reached 73.6% and 70.4%, respectively. Additionally, the prediction accuracy showed a downward trend from Level 1 to Level 4, which correlates with the uneven distribution of data across the heat levels in the actual dataset. This suggests that with the more robust dataset, public opinion event heat level prediction based on large language models will have significant research potential for the future.
format Preprint
id arxiv_https___arxiv_org_abs_2409_18548
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Research on Predicting Public Opinion Event Heat Levels Based on Large Language Models
Ren, Yi
Zhang, Tianyi
Li, Weibin
Zhou, DuoMu
Qin, Chenhao
Dong, FangCheng
Computation and Language
Artificial Intelligence
In recent years, with the rapid development of large language models, serval models such as GPT-4o have demonstrated extraordinary capabilities, surpassing human performance in various language tasks. As a result, many researchers have begun exploring their potential applications in the field of public opinion analysis. This study proposes a novel large-language-models-based method for public opinion event heat level prediction. First, we preprocessed and classified 62,836 Chinese hot event data collected between July 2022 and December 2023. Then, based on each event's online dissemination heat index, we used the MiniBatchKMeans algorithm to automatically cluster the events and categorize them into four heat levels (ranging from low heat to very high heat). Next, we randomly selected 250 events from each heat level, totalling 1,000 events, to build the evaluation dataset. During the evaluation process, we employed various large language models to assess their accuracy in predicting event heat levels in two scenarios: without reference cases and with similar case references. The results showed that GPT-4o and DeepseekV2 performed the best in the latter case, achieving prediction accuracies of 41.4% and 41.5%, respectively. Although the overall prediction accuracy remains relatively low, it is worth noting that for low-heat (Level 1) events, the prediction accuracies of these two models reached 73.6% and 70.4%, respectively. Additionally, the prediction accuracy showed a downward trend from Level 1 to Level 4, which correlates with the uneven distribution of data across the heat levels in the actual dataset. This suggests that with the more robust dataset, public opinion event heat level prediction based on large language models will have significant research potential for the future.
title Research on Predicting Public Opinion Event Heat Levels Based on Large Language Models
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2409.18548