MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Mekaoui, Salma, Sofyan, Hiba, Amaaz, Imane, Benchrif, Imane, Zarghili, Arsalane, Chaker, Ilham, Nikolov, Nikola S.
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2511.04248
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911251583467520
author	Mekaoui, Salma Sofyan, Hiba Amaaz, Imane Benchrif, Imane Zarghili, Arsalane Chaker, Ilham Nikolov, Nikola S.
author_facet	Mekaoui, Salma Sofyan, Hiba Amaaz, Imane Benchrif, Imane Zarghili, Arsalane Chaker, Ilham Nikolov, Nikola S.
contents	Extracting topics from text has become an essential task, especially with the rapid growth of unstructured textual data. Most existing works rely on highly computational methods to address this challenge. In this paper, we argue that probabilistic and statistical approaches, such as topic modeling (TM), can offer effective alternatives that require fewer computational resources. TM is a statistical method that automatically discovers topics in large collections of unlabeled text; however, it produces topics as distributions of representative words, which often lack clear interpretability. Our objective is to perform topic labeling by assigning meaningful labels to these sets of words. To achieve this without relying on computationally expensive models, we propose a graph-based approach that not only enriches topic words with semantically related terms but also explores the relationships among them. By analyzing these connections within the graph, we derive suitable labels that accurately capture each topic's meaning. We present a comparative study between our proposed method and several benchmarks, including ChatGPT-3.5, across two different datasets. Our method achieved consistently better results than traditional benchmarks in terms of BERTScore and cosine similarity and produced results comparable to ChatGPT-3.5, while remaining computationally efficient. Finally, we discuss future directions for topic labeling and highlight potential research avenues for enhancing interpretability and automation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_04248
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models Mekaoui, Salma Sofyan, Hiba Amaaz, Imane Benchrif, Imane Zarghili, Arsalane Chaker, Ilham Nikolov, Nikola S. Computation and Language Extracting topics from text has become an essential task, especially with the rapid growth of unstructured textual data. Most existing works rely on highly computational methods to address this challenge. In this paper, we argue that probabilistic and statistical approaches, such as topic modeling (TM), can offer effective alternatives that require fewer computational resources. TM is a statistical method that automatically discovers topics in large collections of unlabeled text; however, it produces topics as distributions of representative words, which often lack clear interpretability. Our objective is to perform topic labeling by assigning meaningful labels to these sets of words. To achieve this without relying on computationally expensive models, we propose a graph-based approach that not only enriches topic words with semantically related terms but also explores the relationships among them. By analyzing these connections within the graph, we derive suitable labels that accurately capture each topic's meaning. We present a comparative study between our proposed method and several benchmarks, including ChatGPT-3.5, across two different datasets. Our method achieved consistently better results than traditional benchmarks in terms of BERTScore and cosine similarity and produced results comparable to ChatGPT-3.5, while remaining computationally efficient. Finally, we discuss future directions for topic labeling and highlight potential research avenues for enhancing interpretability and automation.
title	Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models
topic	Computation and Language
url	https://arxiv.org/abs/2511.04248

Documenti analoghi