Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Park, Jimin, Ji, AHyun, Park, Minji, Rahman, Mohammad Saidur, Oh, Se Eun
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2501.01110
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913632999178240
author	Park, Jimin Ji, AHyun Park, Minji Rahman, Mohammad Saidur Oh, Se Eun
author_facet	Park, Jimin Ji, AHyun Park, Minji Rahman, Mohammad Saidur Oh, Se Eun
contents	Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with catastrophic forgetting, where a model's performance on old data degrades over time. In this paper, we introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples. Additionally, we implement innovative selection schemes for replay samples based on the model's hidden representations. Our comprehensive evaluation across Windows and Android malware datasets in a class-incremental learning scenario -- where new classes are introduced continuously over multiple tasks -- demonstrates substantial performance improvements over previous methods. For example, our system achieves an average accuracy of 55% on Windows malware samples, significantly outperforming other GR-based models by 28%. This study provides practical insights for advancing GR-based malware classification systems. The implementation is available at \url {https://github.com/MalwareReplayGAN/MalCL}\footnote{The code will be made public upon the presentation of the paper}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_01110
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification Park, Jimin Ji, AHyun Park, Minji Rahman, Mohammad Saidur Oh, Se Eun Cryptography and Security Artificial Intelligence Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with catastrophic forgetting, where a model's performance on old data degrades over time. In this paper, we introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples. Additionally, we implement innovative selection schemes for replay samples based on the model's hidden representations. Our comprehensive evaluation across Windows and Android malware datasets in a class-incremental learning scenario -- where new classes are introduced continuously over multiple tasks -- demonstrates substantial performance improvements over previous methods. For example, our system achieves an average accuracy of 55% on Windows malware samples, significantly outperforming other GR-based models by 28%. This study provides practical insights for advancing GR-based malware classification systems. The implementation is available at \url {https://github.com/MalwareReplayGAN/MalCL}\footnote{The code will be made public upon the presentation of the paper}.
title	MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification
topic	Cryptography and Security Artificial Intelligence
url	https://arxiv.org/abs/2501.01110

Similar Items