Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Altieri, Massimiliano, Hamon, Ronan, Corizzo, Roberto, Ceci, Michelangelo, Sanchez, Ignacio
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security Machine Learning
Online Access:	https://arxiv.org/abs/2603.11200
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911506858246144
author	Altieri, Massimiliano Hamon, Ronan Corizzo, Roberto Ceci, Michelangelo Sanchez, Ignacio
author_facet	Altieri, Massimiliano Hamon, Ronan Corizzo, Roberto Ceci, Michelangelo Sanchez, Ignacio
contents	Network intrusion detection systems play a crucial role in the security strategy employed by organisations to detect and prevent cyberattacks. Such systems usually combine pattern detection signatures with anomaly detection techniques powered by machine learning methods. However, the commonly proposed machine learning methods present drawbacks such as over-reliance on labeled data and limited generalization capabilities. To address these issues, embedding-based methods have been introduced to learn representations from network data, such as DNS traffic, mainly due to its large availability, that generalise effectively to many downstream tasks. However, current approaches do not properly consider contextual information among DNS queries. In this paper, we tackle this issue by proposing DNS-GT, a novel Transformer-based model that learns embeddings for domain names from sequences of DNS queries. The model is first pre-trained in a self-supervised fashion in order to learn the general behavior of DNS activity. Then, it can be finetuned on specific downstream tasks, exploiting interactions with other relevant queries in a given sequence. Our experiments with real-world DNS data showcase the ability of our method to learn effective domain name representations. A quantitative evaluation on domain name classification and botnet detection tasks shows that our approach achieves better results compared to relevant baselines, creating opportunities for further exploration of large-scale language models for intrusion detection systems. Our code is available at: https://github.com/m-altieri/DNS-GT.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_11200
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	DNS-GT: A Graph-based Transformer Approach to Learn Embeddings of Domain Names from DNS Queries Altieri, Massimiliano Hamon, Ronan Corizzo, Roberto Ceci, Michelangelo Sanchez, Ignacio Cryptography and Security Machine Learning Network intrusion detection systems play a crucial role in the security strategy employed by organisations to detect and prevent cyberattacks. Such systems usually combine pattern detection signatures with anomaly detection techniques powered by machine learning methods. However, the commonly proposed machine learning methods present drawbacks such as over-reliance on labeled data and limited generalization capabilities. To address these issues, embedding-based methods have been introduced to learn representations from network data, such as DNS traffic, mainly due to its large availability, that generalise effectively to many downstream tasks. However, current approaches do not properly consider contextual information among DNS queries. In this paper, we tackle this issue by proposing DNS-GT, a novel Transformer-based model that learns embeddings for domain names from sequences of DNS queries. The model is first pre-trained in a self-supervised fashion in order to learn the general behavior of DNS activity. Then, it can be finetuned on specific downstream tasks, exploiting interactions with other relevant queries in a given sequence. Our experiments with real-world DNS data showcase the ability of our method to learn effective domain name representations. A quantitative evaluation on domain name classification and botnet detection tasks shows that our approach achieves better results compared to relevant baselines, creating opportunities for further exploration of large-scale language models for intrusion detection systems. Our code is available at: https://github.com/m-altieri/DNS-GT.
title	DNS-GT: A Graph-based Transformer Approach to Learn Embeddings of Domain Names from DNS Queries
topic	Cryptography and Security Machine Learning
url	https://arxiv.org/abs/2603.11200

Similar Items