Saved in:
Bibliographic Details
Main Authors: Carbonetto, Peter, Sarkar, Abhishek, Wang, Zihao, Stephens, Matthew
Format: Preprint
Published: 2021
Subjects:
Online Access:https://arxiv.org/abs/2105.13440
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908820807090176
author Carbonetto, Peter
Sarkar, Abhishek
Wang, Zihao
Stephens, Matthew
author_facet Carbonetto, Peter
Sarkar, Abhishek
Wang, Zihao
Stephens, Matthew
contents In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in topic models is closely related to non-negative matrix factorization (NMF). Yet, to our knowledge, this relationship has not been exploited previously to fit topic models. We show that recent advances in NMF optimization methods can be leveraged to fit topic models very efficiently, often resulting in much better fits and in less time than existing algorithms for topic models. We also formally make the connection between the NMF optimization problem and maximum-likelihood estimation for the topic model, and using this result we show that the expectation maximization (EM) algorithm for the topic model is essentially the same as the classic multiplicative updates for NMF (the only difference being that the operations are performed in a different order). Our methods are implemented in the R package fastTopics.
format Preprint
id arxiv_https___arxiv_org_abs_2105_13440
institution arXiv
publishDate 2021
record_format arxiv
spellingShingle Non-negative matrix factorization algorithms generally improve topic model fits
Carbonetto, Peter
Sarkar, Abhishek
Wang, Zihao
Stephens, Matthew
Machine Learning
Computation
In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in topic models is closely related to non-negative matrix factorization (NMF). Yet, to our knowledge, this relationship has not been exploited previously to fit topic models. We show that recent advances in NMF optimization methods can be leveraged to fit topic models very efficiently, often resulting in much better fits and in less time than existing algorithms for topic models. We also formally make the connection between the NMF optimization problem and maximum-likelihood estimation for the topic model, and using this result we show that the expectation maximization (EM) algorithm for the topic model is essentially the same as the classic multiplicative updates for NMF (the only difference being that the operations are performed in a different order). Our methods are implemented in the R package fastTopics.
title Non-negative matrix factorization algorithms generally improve topic model fits
topic Machine Learning
Computation
url https://arxiv.org/abs/2105.13440