Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Shuhao, Li, Jiarui, Cao, Qi, Zhang, Ruiyi, Xie, Pengtao
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security Machine Learning
Online Access:	https://arxiv.org/abs/2605.30837
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916064364855296
author	Zhang, Shuhao Li, Jiarui Cao, Qi Zhang, Ruiyi Xie, Pengtao
author_facet	Zhang, Shuhao Li, Jiarui Cao, Qi Zhang, Ruiyi Xie, Pengtao
contents	Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_30837
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense Zhang, Shuhao Li, Jiarui Cao, Qi Zhang, Ruiyi Xie, Pengtao Cryptography and Security Machine Learning Prompt-injection detectors are heterogeneous: each is strong on a different slice of attacks, and none is always reliable. Yet existing systems still treat detection as a fixed single-detector pipeline, committing every request to one detector's blind spots. We reframe defense as detector allocation: given a heterogeneous pool, decide per request which detectors to run and whether to escalate to an LLM judge. Our framework SCOUT (Scalable and Controllable Outcome-prediction for Uncertainty-aware Triage) makes this decision dynamic by predicting each detector's per-sample reliability and latency from how it behaved on similar past inputs, and exposes a single safety-utility threshold to the operator (where utility bundles benign-pass rate and wall-clock). To evaluate this setting, we build SCOUT-450, a benchmark that captures the structurally complex, agent-facing injections that older prompt-injection sets under-represent. On SCOUT-450, a safety-oriented operating point reduces attack-success rate by 46% and total wall-clock by 40% relative to an always-on GPT-4o judge, at a 5.1-point benign-utility drop. SCOUT also transfers to three external benchmarks (BIPIA, IPI, and IHEval), improving the safety-utility frontier.
title	Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense
topic	Cryptography and Security Machine Learning
url	https://arxiv.org/abs/2605.30837

Similar Items