Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Munshi, Sarthak, Pathak, Swapnil, Ghatode, Sonam, Priyadarshini, Thenuga, Chandramouleeswaran, Dhivya, Rana, Ashutosh
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.11565
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908377930530816
author	Munshi, Sarthak Pathak, Swapnil Ghatode, Sonam Priyadarshini, Thenuga Chandramouleeswaran, Dhivya Rana, Ashutosh
author_facet	Munshi, Sarthak Pathak, Swapnil Ghatode, Sonam Priyadarshini, Thenuga Chandramouleeswaran, Dhivya Rana, Ashutosh
contents	While Large Language Models have shown promise in cybersecurity applications, their effectiveness in identifying security threats within cloud deployments remains unexplored. This paper introduces AWS Cloud Security Engineering Eval, a novel dataset for evaluating LLMs cloud security threat modeling capabilities. ACSE-Eval contains 100 production grade AWS deployment scenarios, each featuring detailed architectural specifications, Infrastructure as Code implementations, documented security vulnerabilities, and associated threat modeling parameters. Our dataset enables systemic assessment of LLMs abilities to identify security risks, analyze attack vectors, and propose mitigation strategies in cloud environments. Our evaluations on ACSE-Eval demonstrate that GPT 4.1 and Gemini 2.5 Pro excel at threat identification, with Gemini 2.5 Pro performing optimally in 0-shot scenarios and GPT 4.1 showing superior results in few-shot settings. While GPT 4.1 maintains a slight overall performance advantage, Claude 3.7 Sonnet generates the most semantically sophisticated threat models but struggles with threat categorization and generalization. To promote reproducibility and advance research in automated cybersecurity threat analysis, we open-source our dataset, evaluation metrics, and methodologies.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_11565
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ACSE-Eval: Can LLMs threat model real-world cloud infrastructure? Munshi, Sarthak Pathak, Swapnil Ghatode, Sonam Priyadarshini, Thenuga Chandramouleeswaran, Dhivya Rana, Ashutosh Cryptography and Security Artificial Intelligence While Large Language Models have shown promise in cybersecurity applications, their effectiveness in identifying security threats within cloud deployments remains unexplored. This paper introduces AWS Cloud Security Engineering Eval, a novel dataset for evaluating LLMs cloud security threat modeling capabilities. ACSE-Eval contains 100 production grade AWS deployment scenarios, each featuring detailed architectural specifications, Infrastructure as Code implementations, documented security vulnerabilities, and associated threat modeling parameters. Our dataset enables systemic assessment of LLMs abilities to identify security risks, analyze attack vectors, and propose mitigation strategies in cloud environments. Our evaluations on ACSE-Eval demonstrate that GPT 4.1 and Gemini 2.5 Pro excel at threat identification, with Gemini 2.5 Pro performing optimally in 0-shot scenarios and GPT 4.1 showing superior results in few-shot settings. While GPT 4.1 maintains a slight overall performance advantage, Claude 3.7 Sonnet generates the most semantically sophisticated threat models but struggles with threat categorization and generalization. To promote reproducibility and advance research in automated cybersecurity threat analysis, we open-source our dataset, evaluation metrics, and methodologies.
title	ACSE-Eval: Can LLMs threat model real-world cloud infrastructure?
topic	Cryptography and Security Artificial Intelligence
url	https://arxiv.org/abs/2505.11565

Similar Items