Saved in:
Bibliographic Details
Main Authors: Wong, Albert, Cheng, Florence Wing Yau, Keung, Ashley, Hercules, Yamileth, Garcia, Mary Alexandra, Lim, Yew-Wei, Pham, Lien
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.17892
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929436899672064
author Wong, Albert
Cheng, Florence Wing Yau
Keung, Ashley
Hercules, Yamileth
Garcia, Mary Alexandra
Lim, Yew-Wei
Pham, Lien
author_facet Wong, Albert
Cheng, Florence Wing Yau
Keung, Ashley
Hercules, Yamileth
Garcia, Mary Alexandra
Lim, Yew-Wei
Pham, Lien
contents Topic modelling has become increasingly popular for summarizing text data, such as social media posts and articles. However, topic modelling is usually completed in one shot. Assessing the quality of resulting topics is challenging. No effective methods or measures have been developed for assessing the results or for making further enhancements to the topics. In this research, we propose we propose to use an iterative process to perform topic modelling that gives rise to a sense of completeness of the resulting topics when the process is complete. Using the BERTopic package, a popular method in topic modelling, we demonstrate how the modelling process can be applied iteratively to arrive at a set of topics that could not be further improved upon using one of the three selected measures for clustering comparison as the decision criteria. This demonstration is conducted using a subset of the COVIDSenti-A dataset. The early success leads us to believe that further research using in using this approach in conjunction with other topic modelling algorithms could be viable.
format Preprint
id arxiv_https___arxiv_org_abs_2407_17892
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle An Iterative Approach to Topic Modelling
Wong, Albert
Cheng, Florence Wing Yau
Keung, Ashley
Hercules, Yamileth
Garcia, Mary Alexandra
Lim, Yew-Wei
Pham, Lien
Machine Learning
Artificial Intelligence
Topic modelling has become increasingly popular for summarizing text data, such as social media posts and articles. However, topic modelling is usually completed in one shot. Assessing the quality of resulting topics is challenging. No effective methods or measures have been developed for assessing the results or for making further enhancements to the topics. In this research, we propose we propose to use an iterative process to perform topic modelling that gives rise to a sense of completeness of the resulting topics when the process is complete. Using the BERTopic package, a popular method in topic modelling, we demonstrate how the modelling process can be applied iteratively to arrive at a set of topics that could not be further improved upon using one of the three selected measures for clustering comparison as the decision criteria. This demonstration is conducted using a subset of the COVIDSenti-A dataset. The early success leads us to believe that further research using in using this approach in conjunction with other topic modelling algorithms could be viable.
title An Iterative Approach to Topic Modelling
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2407.17892