Saved in:
Bibliographic Details
Main Authors: Zhu, Chloe Qinyu, Stureborg, Rickard, Dhingra, Bhuwan
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.01783
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911770080182272
author Zhu, Chloe Qinyu
Stureborg, Rickard
Dhingra, Bhuwan
author_facet Zhu, Chloe Qinyu
Stureborg, Rickard
Dhingra, Bhuwan
contents Vaccine concerns are an ever-evolving target, and can shift quickly as seen during the COVID-19 pandemic. Identifying longitudinal trends in vaccine concerns and misinformation might inform the healthcare space by helping public health efforts strategically allocate resources or information campaigns. We explore the task of detecting vaccine concerns in online discourse using large language models (LLMs) in a zero-shot setting without the need for expensive training datasets. Since real-time monitoring of online sources requires large-scale inference, we explore cost-accuracy trade-offs of different prompting strategies and offer concrete takeaways that may inform choices in system designs for current applications. An analysis of different prompting strategies reveals that classifying the concerns over multiple passes through the LLM, each consisting a boolean question whether the text mentions a vaccine concern or not, works the best. Our results indicate that GPT-4 can strongly outperform crowdworker accuracy when compared to ground truth annotations provided by experts on the recently introduced VaxConcerns dataset, achieving an overall F1 score of 78.7%.
format Preprint
id arxiv_https___arxiv_org_abs_2402_01783
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Hierarchical Multi-Label Classification of Online Vaccine Concerns
Zhu, Chloe Qinyu
Stureborg, Rickard
Dhingra, Bhuwan
Computation and Language
Artificial Intelligence
Machine Learning
Vaccine concerns are an ever-evolving target, and can shift quickly as seen during the COVID-19 pandemic. Identifying longitudinal trends in vaccine concerns and misinformation might inform the healthcare space by helping public health efforts strategically allocate resources or information campaigns. We explore the task of detecting vaccine concerns in online discourse using large language models (LLMs) in a zero-shot setting without the need for expensive training datasets. Since real-time monitoring of online sources requires large-scale inference, we explore cost-accuracy trade-offs of different prompting strategies and offer concrete takeaways that may inform choices in system designs for current applications. An analysis of different prompting strategies reveals that classifying the concerns over multiple passes through the LLM, each consisting a boolean question whether the text mentions a vaccine concern or not, works the best. Our results indicate that GPT-4 can strongly outperform crowdworker accuracy when compared to ground truth annotations provided by experts on the recently introduced VaxConcerns dataset, achieving an overall F1 score of 78.7%.
title Hierarchical Multi-Label Classification of Online Vaccine Concerns
topic Computation and Language
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2402.01783