Saved in:
Bibliographic Details
Main Authors: Ogunleye, Bayode, Maswera, Tonderai, Hirsch, Laurence, Gaudoin, Jotham, Brunsdon, Teresa
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.03176
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929234455298048
author Ogunleye, Bayode
Maswera, Tonderai
Hirsch, Laurence
Gaudoin, Jotham
Brunsdon, Teresa
author_facet Ogunleye, Bayode
Maswera, Tonderai
Hirsch, Laurence
Gaudoin, Jotham
Brunsdon, Teresa
contents Topic modelling is a prominent task for automatic topic extraction in many applications such as sentiment analysis and recommendation systems. The approach is vital for service industries to monitor their customer discussions. The use of traditional approaches such as Latent Dirichlet Allocation (LDA) for topic discovery has shown great performances, however, they are not consistent in their results as these approaches suffer from data sparseness and inability to model the word order in a document. Thus, this study presents the use of Kernel Principal Component Analysis (KernelPCA) and K-means Clustering in the BERTopic architecture. We have prepared a new dataset using tweets from customers of Nigerian banks and we use this to compare the topic modelling approaches. Our findings showed KernelPCA and K-means in the BERTopic architecture-produced coherent topics with a coherence score of 0.8463.
format Preprint
id arxiv_https___arxiv_org_abs_2402_03176
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Comparison of Topic Modelling Approaches in the Banking Context
Ogunleye, Bayode
Maswera, Tonderai
Hirsch, Laurence
Gaudoin, Jotham
Brunsdon, Teresa
Information Retrieval
Artificial Intelligence
Machine Learning
Computation
H.3.3
Topic modelling is a prominent task for automatic topic extraction in many applications such as sentiment analysis and recommendation systems. The approach is vital for service industries to monitor their customer discussions. The use of traditional approaches such as Latent Dirichlet Allocation (LDA) for topic discovery has shown great performances, however, they are not consistent in their results as these approaches suffer from data sparseness and inability to model the word order in a document. Thus, this study presents the use of Kernel Principal Component Analysis (KernelPCA) and K-means Clustering in the BERTopic architecture. We have prepared a new dataset using tweets from customers of Nigerian banks and we use this to compare the topic modelling approaches. Our findings showed KernelPCA and K-means in the BERTopic architecture-produced coherent topics with a coherence score of 0.8463.
title Comparison of Topic Modelling Approaches in the Banking Context
topic Information Retrieval
Artificial Intelligence
Machine Learning
Computation
H.3.3
url https://arxiv.org/abs/2402.03176