Saved in:
Bibliographic Details
Main Authors: Stein, Merlin, Gandhi, Milan, Kriecherbauer, Theresa, Oueslati, Amin, Trager, Robert
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.20847
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Artificial Intelligence (AI) Safety Institutes and governments worldwide are deciding whether they evaluate advanced AI themselves, support a private evaluation ecosystem or do both. Evaluation regimes have been established in a wide range of industry contexts to monitor and evaluate firms' compliance with regulation. Evaluation is a necessary governance tool to understand and manage the risks of a technology. This paper draws from nine such regimes to inform (i) who should evaluate which parts of advanced AI; and (ii) how much capacity public bodies may need to evaluate advanced AI effectively. First, the effective responsibility distribution between public and private evaluators depends heavily on specific industry and evaluation conditions. On the basis of advanced AI's risk profile, the sensitivity of information involved in the evaluation process, and the high costs of verifying safety and benefit claims of AI Labs, we recommend that public bodies become directly involved in safety critical, especially gray- and white-box, AI model evaluations. Governance and security audits, which are well-established in other industry contexts, as well as black-box model evaluations, may be more efficiently provided by a private market of evaluators and auditors under public oversight. Secondly, to effectively fulfil their role in advanced AI audits, public bodies need extensive access to models and facilities. AISI's capacity should scale with the industry's risk level, size and market concentration, potentially requiring 100s of employees for evaluations in large jurisdictions like the EU or US, like in nuclear safety and life sciences.