Saved in:
Bibliographic Details
Main Author: Tlaie, Alejandro
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.19749
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914991876079616
author Tlaie, Alejandro
author_facet Tlaie, Alejandro
contents This paper leverages insights from Alignment Theory (AT) research, which primarily focuses on the potential pitfalls of technical alignment in Artificial Intelligence, to critically examine the European Union's Artificial Intelligence Act (EU AI Act). In the context of AT research, several key failure modes - such as proxy gaming, goal drift, reward hacking or specification gaming - have been identified. These can arise when AI systems are not properly aligned with their intended objectives. The central logic of this report is: what can we learn if we treat regulatory efforts in the same way as we treat advanced AI systems? As we systematically apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
format Preprint
id arxiv_https___arxiv_org_abs_2410_19749
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks
Tlaie, Alejandro
Computers and Society
Artificial Intelligence
This paper leverages insights from Alignment Theory (AT) research, which primarily focuses on the potential pitfalls of technical alignment in Artificial Intelligence, to critically examine the European Union's Artificial Intelligence Act (EU AI Act). In the context of AT research, several key failure modes - such as proxy gaming, goal drift, reward hacking or specification gaming - have been identified. These can arise when AI systems are not properly aligned with their intended objectives. The central logic of this report is: what can we learn if we treat regulatory efforts in the same way as we treat advanced AI systems? As we systematically apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
title Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks
topic Computers and Society
Artificial Intelligence
url https://arxiv.org/abs/2410.19749