Saved in:
Bibliographic Details
Main Authors: Monnet, Nathan, Maréchal, Loïc, Jang-Jaccard, Julian, Mermoud, Alain
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.11573
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910774910255104
author Monnet, Nathan
Maréchal, Loïc
Jang-Jaccard, Julian
Mermoud, Alain
author_facet Monnet, Nathan
Maréchal, Loïc
Jang-Jaccard, Julian
Mermoud, Alain
contents We introduce a novel approach to text classification by combining doc2vec embeddings with advanced clustering techniques to improve the analysis of specialized, high-dimensional textual data. We integrate unsupervised methods such as Louvain, K-means, and Spectral clustering with doc2vec to enhance the detection of semantic patterns across a large corpus. As a case study, we apply this methodology to cybersecurity risk analysis using the MITRE ATT\&CK framework to structure and reduce the dimensionality of cyberattack tactics. Louvain clustering proved the most effective among the tested methods, achieving the best balance between cluster coherence and computational efficiency. Our approach identifies four "super tactics," demonstrating how clustering improves thematic coherence and risk attribution. The results validate the utility of combining doc2vec with clustering, particularly Louvain, for enhancing topic modeling and text classification.
format Preprint
id arxiv_https___arxiv_org_abs_2410_11573
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Clustering doc2vec output for topic-dimensionality reduction: A MITRE ATT&CK calibration
Monnet, Nathan
Maréchal, Loïc
Jang-Jaccard, Julian
Mermoud, Alain
Computational Engineering, Finance, and Science
We introduce a novel approach to text classification by combining doc2vec embeddings with advanced clustering techniques to improve the analysis of specialized, high-dimensional textual data. We integrate unsupervised methods such as Louvain, K-means, and Spectral clustering with doc2vec to enhance the detection of semantic patterns across a large corpus. As a case study, we apply this methodology to cybersecurity risk analysis using the MITRE ATT\&CK framework to structure and reduce the dimensionality of cyberattack tactics. Louvain clustering proved the most effective among the tested methods, achieving the best balance between cluster coherence and computational efficiency. Our approach identifies four "super tactics," demonstrating how clustering improves thematic coherence and risk attribution. The results validate the utility of combining doc2vec with clustering, particularly Louvain, for enhancing topic modeling and text classification.
title Clustering doc2vec output for topic-dimensionality reduction: A MITRE ATT&CK calibration
topic Computational Engineering, Finance, and Science
url https://arxiv.org/abs/2410.11573