Saved in:
Bibliographic Details
Main Authors: Goertzel, Ben, Yibelo, Paulos
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.21029
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913812465057792
author Goertzel, Ben
Yibelo, Paulos
author_facet Goertzel, Ben
Yibelo, Paulos
contents We propose a robust transformer architecture designed to prevent prompt injection attacks and ensure secure, reliable response generation. Our PICO (Prompt Isolation and Cybersecurity Oversight) framework structurally separates trusted system instructions from untrusted user inputs through dual channels that are processed independently and merged only by a controlled, gated fusion mechanism. In addition, we integrate a specialized Security Expert Agent within a Mixture-of-Experts (MoE) framework and incorporate a Cybersecurity Knowledge Graph (CKG) to supply domain-specific reasoning. Our training design further ensures that the system prompt branch remains immutable while the rest of the network learns to handle adversarial inputs safely. This PICO framework is presented via a general mathematical formulation, then elaborated in terms of the specifics of transformer architecture, and fleshed out via hypothetical case studies including Policy Puppetry attacks. While the most effective implementation may involve training transformers in a PICO-based way from scratch, we also present a cost-effective fine-tuning approach.
format Preprint
id arxiv_https___arxiv_org_abs_2504_21029
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight
Goertzel, Ben
Yibelo, Paulos
Cryptography and Security
Artificial Intelligence
We propose a robust transformer architecture designed to prevent prompt injection attacks and ensure secure, reliable response generation. Our PICO (Prompt Isolation and Cybersecurity Oversight) framework structurally separates trusted system instructions from untrusted user inputs through dual channels that are processed independently and merged only by a controlled, gated fusion mechanism. In addition, we integrate a specialized Security Expert Agent within a Mixture-of-Experts (MoE) framework and incorporate a Cybersecurity Knowledge Graph (CKG) to supply domain-specific reasoning. Our training design further ensures that the system prompt branch remains immutable while the rest of the network learns to handle adversarial inputs safely. This PICO framework is presented via a general mathematical formulation, then elaborated in terms of the specifics of transformer architecture, and fleshed out via hypothetical case studies including Policy Puppetry attacks. While the most effective implementation may involve training transformers in a PICO-based way from scratch, we also present a cost-effective fine-tuning approach.
title PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight
topic Cryptography and Security
Artificial Intelligence
url https://arxiv.org/abs/2504.21029