Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Shouqiao, Politi, Marcello, Marro, Samuele, Crapis, Davide
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.20925
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910062671298560
author	Wang, Shouqiao Politi, Marcello Marro, Samuele Crapis, Davide
author_facet	Wang, Shouqiao Politi, Marcello Marro, Samuele Crapis, Davide
contents	As agentic systems move into real-world deployments, their decisions increasingly depend on external inputs such as retrieved content, tool outputs, and information provided by other actors. When these inputs can be strategically shaped by adversaries, the relevant security risk extends beyond a fixed library of prompt attacks to adaptive strategies that steer agents toward unfavorable outcomes. We propose profit-driven red teaming, a stress-testing protocol that replaces handcrafted attacks with a learned opponent trained to maximize its profit using only scalar outcome feedback. The protocol requires no LLM-as-judge scoring, attack labels, or attack taxonomy, and is designed for structured settings with auditable outcomes. We instantiate it in a lean arena of four canonical economic interactions, which provide a controlled testbed for adaptive exploitability. In controlled experiments, agents that appear strong against static baselines become consistently exploitable under profit-optimized pressure, and the learned opponent discovers probing, anchoring, and deceptive commitments without explicit instruction. We then distill exploit episodes into concise prompt rules for the agent, which make most previously observed failures ineffective and substantially improve target performance. These results suggest that profit-driven red-team data can provide a practical route to improving robustness in structured agent settings with auditable outcomes.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_20925
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions Wang, Shouqiao Politi, Marcello Marro, Samuele Crapis, Davide Artificial Intelligence As agentic systems move into real-world deployments, their decisions increasingly depend on external inputs such as retrieved content, tool outputs, and information provided by other actors. When these inputs can be strategically shaped by adversaries, the relevant security risk extends beyond a fixed library of prompt attacks to adaptive strategies that steer agents toward unfavorable outcomes. We propose profit-driven red teaming, a stress-testing protocol that replaces handcrafted attacks with a learned opponent trained to maximize its profit using only scalar outcome feedback. The protocol requires no LLM-as-judge scoring, attack labels, or attack taxonomy, and is designed for structured settings with auditable outcomes. We instantiate it in a lean arena of four canonical economic interactions, which provide a controlled testbed for adaptive exploitability. In controlled experiments, agents that appear strong against static baselines become consistently exploitable under profit-optimized pressure, and the learned opponent discovers probing, anchoring, and deceptive commitments without explicit instruction. We then distill exploit episodes into concise prompt rules for the agent, which make most previously observed failures ineffective and substantially improve target performance. These results suggest that profit-driven red-team data can provide a practical route to improving robustness in structured agent settings with auditable outcomes.
title	Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions
topic	Artificial Intelligence
url	https://arxiv.org/abs/2603.20925

Similar Items