Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Min, Aehong, Wang, Xuan, Correia, Rion Brattig, Rozum, Jordan, Miller, Wendy R., Rocha, Luis M.
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Social and Information Networks
Online Access:	https://arxiv.org/abs/2405.08784
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911876793761792
author	Min, Aehong Wang, Xuan Correia, Rion Brattig Rozum, Jordan Miller, Wendy R. Rocha, Luis M.
author_facet	Min, Aehong Wang, Xuan Correia, Rion Brattig Rozum, Jordan Miller, Wendy R. Rocha, Luis M.
contents	We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_08784
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram Min, Aehong Wang, Xuan Correia, Rion Brattig Rozum, Jordan Miller, Wendy R. Rocha, Luis M. Computation and Language Social and Information Networks We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task.
title	Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram
topic	Computation and Language Social and Information Networks
url	https://arxiv.org/abs/2405.08784

Similar Items