Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Morasso, Cristian, Halimi, Anisa, Hameed, Muhammad Zaid, Leith, Douglas
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2605.12565
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Existing automated red-teaming pipelines often miss attacks that depend on attacker identity, framing, or multi-turn tactics. This under-coverage underestimates real-world risk. We introduce Persona-Conditioned Adversarial Prompting (PCAP), which conditions adversarial search on attacker personas and strategy cards and runs parallel persona-conditioned beam searches to discover diverse, transferable jailbreaks. PCAP is orthogonal to the underlying search algorithm and substantially increases attack success rate (ASR) and prompt diversity (e.g., ASR on GPT-OSS~120B from $\approx58\% \rightarrow \approx97\%$), improving attack strategy coverage and diversity.

Similar Items