Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kozlowski, Diego, Pradier, Carolina, Benz, Pierre
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.07003
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909286282559488
author	Kozlowski, Diego Pradier, Carolina Benz, Pierre
author_facet	Kozlowski, Diego Pradier, Carolina Benz, Pierre
contents	Topic Modeling has become a prominent tool for the study of scientific fields, as they allow for a large scale interpretation of research trends. Nevertheless, the output of these models is structured as a list of keywords which requires a manual interpretation for the labelling. This paper proposes to assess the reliability of three LLMs, namely flan, GPT-4o, and GPT-4 mini for topic labelling. Drawing on previous research leveraging BERTopic, we generate topics from a dataset of all the scientific articles (n=34,797) authored by all biology professors in Switzerland (n=465) between 2008 and 2020, as recorded in the Web of Science database. We assess the output of the three models both quantitatively and qualitatively and find that, first, both GPT models are capable of accurately and precisely label topics from the models' output keywords. Second, 3-word labels are preferable to grasp the complexity of research topics.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_07003
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Generative AI for automatic topic labelling Kozlowski, Diego Pradier, Carolina Benz, Pierre Computation and Language Artificial Intelligence Topic Modeling has become a prominent tool for the study of scientific fields, as they allow for a large scale interpretation of research trends. Nevertheless, the output of these models is structured as a list of keywords which requires a manual interpretation for the labelling. This paper proposes to assess the reliability of three LLMs, namely flan, GPT-4o, and GPT-4 mini for topic labelling. Drawing on previous research leveraging BERTopic, we generate topics from a dataset of all the scientific articles (n=34,797) authored by all biology professors in Switzerland (n=465) between 2008 and 2020, as recorded in the Web of Science database. We assess the output of the three models both quantitatively and qualitatively and find that, first, both GPT models are capable of accurately and precisely label topics from the models' output keywords. Second, 3-word labels are preferable to grasp the complexity of research topics.
title	Generative AI for automatic topic labelling
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2408.07003

Similar Items