Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yakymovych, Andrey, Singh, Abhishek
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2407.08888
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929418370285568
author	Yakymovych, Andrey Singh, Abhishek
author_facet	Yakymovych, Andrey Singh, Abhishek
contents	Recent threat reports highlight that email remains the top vector for delivering malware to endpoints. Despite these statistics, detecting malicious email attachments and URLs often neglects semantic cues linguistic features and contextual clues. Our study employs BERTopic unsupervised topic modeling to identify common semantics and themes embedded in email to deliver malicious attachments and call-to-action URLs. We preprocess emails by extracting and sanitizing content and employ multilingual embedding models like BGE-M3 for dense representations, which clustering algorithms(HDBSCAN and OPTICS) use to group emails by semantic similarity. Phi3-Mini-4K-Instruct facilitates semantic and hLDA aid in thematic analysis to understand threat actor patterns. Our research will evaluate and compare different clustering algorithms on topic quantity, coherence, and diversity metrics, concluding with insights into the semantics and topics commonly used by threat actors to deliver malicious attachments and URLs, a significant contribution to the field of threat detection.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_08888
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Uncovering Semantics and Topics Utilized by Threat Actors to Deliver Malicious Attachments and URLs Yakymovych, Andrey Singh, Abhishek Machine Learning Recent threat reports highlight that email remains the top vector for delivering malware to endpoints. Despite these statistics, detecting malicious email attachments and URLs often neglects semantic cues linguistic features and contextual clues. Our study employs BERTopic unsupervised topic modeling to identify common semantics and themes embedded in email to deliver malicious attachments and call-to-action URLs. We preprocess emails by extracting and sanitizing content and employ multilingual embedding models like BGE-M3 for dense representations, which clustering algorithms(HDBSCAN and OPTICS) use to group emails by semantic similarity. Phi3-Mini-4K-Instruct facilitates semantic and hLDA aid in thematic analysis to understand threat actor patterns. Our research will evaluate and compare different clustering algorithms on topic quantity, coherence, and diversity metrics, concluding with insights into the semantics and topics commonly used by threat actors to deliver malicious attachments and URLs, a significant contribution to the field of threat detection.
title	Uncovering Semantics and Topics Utilized by Threat Actors to Deliver Malicious Attachments and URLs
topic	Machine Learning
url	https://arxiv.org/abs/2407.08888

Similar Items