Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Evertz, Jonathan, Risse, Niklas, Neuer, Nicolai, Müller, Andreas, Normann, Philipp, Sapia, Gaetano, Gupta, Srishti, Pape, David, Shaw, Soumya, Srivastav, Devansh, Wressnegger, Christian, Quiring, Erwin, Eisenhofer, Thorsten, Arp, Daniel, Schönherr, Lea
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2512.09549
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909962093985792
author	Evertz, Jonathan Risse, Niklas Neuer, Nicolai Müller, Andreas Normann, Philipp Sapia, Gaetano Gupta, Srishti Pape, David Shaw, Soumya Srivastav, Devansh Wressnegger, Christian Quiring, Erwin Eisenhofer, Thorsten Arp, Daniel Schönherr, Lea
author_facet	Evertz, Jonathan Risse, Niklas Neuer, Nicolai Müller, Andreas Normann, Philipp Sapia, Gaetano Gupta, Srishti Pape, David Shaw, Soumya Srivastav, Devansh Wressnegger, Christian Quiring, Erwin Eisenhofer, Thorsten Arp, Daniel Schönherr, Lea
contents	Large language models (LLMs) are increasingly prevalent in security research. Their unique characteristics, however, introduce challenges that undermine established paradigms of reproducibility, rigor, and evaluation. Prior work has identified common pitfalls in traditional machine learning research, but these studies predate the advent of LLMs. In this paper, we identify nine common pitfalls that have become (more) relevant with the emergence of LLMs and that can compromise the validity of research involving them. These pitfalls span the entire computation process, from data collection, pre-training, and fine-tuning to prompting and evaluation. We assess the prevalence of these pitfalls across all 72 peer-reviewed papers published at leading Security and Software Engineering venues between 2023 and 2024. We find that every paper contains at least one pitfall, and each pitfall appears in multiple papers. Yet only 15.7% of the present pitfalls were explicitly discussed, suggesting that the majority remain unrecognized. To understand their practical impact, we conduct four empirical case studies showing how individual pitfalls can mislead evaluation, inflate performance, or impair reproducibility. Based on our findings, we offer actionable guidelines to support the community in future work.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_09549
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Chasing Shadows: Pitfalls in LLM Security Research Evertz, Jonathan Risse, Niklas Neuer, Nicolai Müller, Andreas Normann, Philipp Sapia, Gaetano Gupta, Srishti Pape, David Shaw, Soumya Srivastav, Devansh Wressnegger, Christian Quiring, Erwin Eisenhofer, Thorsten Arp, Daniel Schönherr, Lea Cryptography and Security Large language models (LLMs) are increasingly prevalent in security research. Their unique characteristics, however, introduce challenges that undermine established paradigms of reproducibility, rigor, and evaluation. Prior work has identified common pitfalls in traditional machine learning research, but these studies predate the advent of LLMs. In this paper, we identify nine common pitfalls that have become (more) relevant with the emergence of LLMs and that can compromise the validity of research involving them. These pitfalls span the entire computation process, from data collection, pre-training, and fine-tuning to prompting and evaluation. We assess the prevalence of these pitfalls across all 72 peer-reviewed papers published at leading Security and Software Engineering venues between 2023 and 2024. We find that every paper contains at least one pitfall, and each pitfall appears in multiple papers. Yet only 15.7% of the present pitfalls were explicitly discussed, suggesting that the majority remain unrecognized. To understand their practical impact, we conduct four empirical case studies showing how individual pitfalls can mislead evaluation, inflate performance, or impair reproducibility. Based on our findings, we offer actionable guidelines to support the community in future work.
title	Chasing Shadows: Pitfalls in LLM Security Research
topic	Cryptography and Security
url	https://arxiv.org/abs/2512.09549

Similar Items