Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hartsock, Alaric, Pereira, Luiz Manella, Fink, Glenn
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Cryptography and Security Machine Learning
Online Access:	https://arxiv.org/abs/2411.07089
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913575901069312
author	Hartsock, Alaric Pereira, Luiz Manella Fink, Glenn
author_facet	Hartsock, Alaric Pereira, Luiz Manella Fink, Glenn
contents	Threat hunting analyzes large, noisy, high-dimensional data to find sparse adversarial behavior. We believe adversarial activities, however they are disguised, are extremely difficult to completely obscure in high dimensional space. In this paper, we employ these latent features of cyber data to find anomalies via a prototype tool called Cyber Log Embeddings Model (CLEM). CLEM was trained on Zeek network traffic logs from both a real-world production network and an from Internet of Things (IoT) cybersecurity testbed. The model is deliberately overtrained on a sliding window of data to characterize each window closely. We use the Adjusted Rand Index (ARI) to comparing the k-means clustering of CLEM output to expert labeling of the embeddings. Our approach demonstrates that there is promise in using natural language modeling to understand cyber data.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_07089
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Towards Characterizing Cyber Networks with Large Language Models Hartsock, Alaric Pereira, Luiz Manella Fink, Glenn Artificial Intelligence Cryptography and Security Machine Learning Threat hunting analyzes large, noisy, high-dimensional data to find sparse adversarial behavior. We believe adversarial activities, however they are disguised, are extremely difficult to completely obscure in high dimensional space. In this paper, we employ these latent features of cyber data to find anomalies via a prototype tool called Cyber Log Embeddings Model (CLEM). CLEM was trained on Zeek network traffic logs from both a real-world production network and an from Internet of Things (IoT) cybersecurity testbed. The model is deliberately overtrained on a sliding window of data to characterize each window closely. We use the Adjusted Rand Index (ARI) to comparing the k-means clustering of CLEM output to expert labeling of the embeddings. Our approach demonstrates that there is promise in using natural language modeling to understand cyber data.
title	Towards Characterizing Cyber Networks with Large Language Models
topic	Artificial Intelligence Cryptography and Security Machine Learning
url	https://arxiv.org/abs/2411.07089

Similar Items