Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Khodorchenko, Maria, Butakov, Nikolay, Zuev, Maxim, Nasonov, Denis
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2410.00655
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910627069427712
author	Khodorchenko, Maria Butakov, Nikolay Zuev, Maxim Nasonov, Denis
author_facet	Khodorchenko, Maria Butakov, Nikolay Zuev, Maxim Nasonov, Denis
contents	In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode. AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analysis or to perform clustering task on interpretable set of features. Quality evaluation is based on specially developed metrics such as coherence and gpt-4-based approaches. Researchers and practitioners can easily integrate new optimization algorithms and adapt novel metrics to enhance modeling quality and extend their experiments. We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM by providing results on 5 datasets with different features and in two different languages.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_00655
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis Khodorchenko, Maria Butakov, Nikolay Zuev, Maxim Nasonov, Denis Machine Learning Computation and Language In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode. AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analysis or to perform clustering task on interpretable set of features. Quality evaluation is based on specially developed metrics such as coherence and gpt-4-based approaches. Researchers and practitioners can easily integrate new optimization algorithms and adapt novel metrics to enhance modeling quality and extend their experiments. We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM by providing results on 5 datasets with different features and in two different languages.
title	AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2410.00655

Similar Items