Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Clymer, Joshua, Weinbaum, Jonah, Kirk, Robert, Mai, Kimberly, Zhang, Selena, Davies, Xander
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2505.18003
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866916754933940224
author	Clymer, Joshua Weinbaum, Jonah Kirk, Robert Mai, Kimberly Zhang, Selena Davies, Xander
author_facet	Clymer, Joshua Weinbaum, Jonah Kirk, Robert Mai, Kimberly Zhang, Selena Davies, Xander
contents	Existing evaluations of AI misuse safeguards provide a patchwork of evidence that is often difficult to connect to real-world decisions. To bridge this gap, we describe an end-to-end argument (a "safety case") that misuse safeguards reduce the risk posed by an AI assistant to low levels. We first describe how a hypothetical developer red teams safeguards, estimating the effort required to evade them. Then, the developer plugs this estimate into a quantitative "uplift model" to determine how much barriers introduced by safeguards dissuade misuse (https://www.aimisusemodel.com/). This procedure provides a continuous signal of risk during deployment that helps the developer rapidly respond to emerging threats. Finally, we describe how to tie these components together into a simple safety case. Our work provides one concrete path -- though not the only path -- to rigorously justifying AI misuse risks are low.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_18003
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	An Example Safety Case for Safeguards Against Misuse Clymer, Joshua Weinbaum, Jonah Kirk, Robert Mai, Kimberly Zhang, Selena Davies, Xander Machine Learning Artificial Intelligence Existing evaluations of AI misuse safeguards provide a patchwork of evidence that is often difficult to connect to real-world decisions. To bridge this gap, we describe an end-to-end argument (a "safety case") that misuse safeguards reduce the risk posed by an AI assistant to low levels. We first describe how a hypothetical developer red teams safeguards, estimating the effort required to evade them. Then, the developer plugs this estimate into a quantitative "uplift model" to determine how much barriers introduced by safeguards dissuade misuse (https://www.aimisusemodel.com/). This procedure provides a continuous signal of risk during deployment that helps the developer rapidly respond to emerging threats. Finally, we describe how to tie these components together into a simple safety case. Our work provides one concrete path -- though not the only path -- to rigorously justifying AI misuse risks are low.
title	An Example Safety Case for Safeguards Against Misuse
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2505.18003

Ähnliche Einträge