Saved in:
Bibliographic Details
Main Authors: Maier, Antoine, Maier, Aude, David, Tom
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.02840
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917262014808064
author Maier, Antoine
Maier, Aude
David, Tom
author_facet Maier, Antoine
Maier, Aude
David, Tom
contents A common but rarely examined assumption in machine learning is that training yields models that actually satisfy their specified objective function. We call this the Objective Satisfaction Assumption (OSA). Although deviations from OSA are acknowledged, their implications are overlooked. We argue, in a learning-paradigm-agnostic framework, that OSA fails in realistic conditions: approximation, estimation, and optimization errors guarantee systematic deviations from the intended objective, regardless of the quality of its specification. Beyond these technical limitations, perfectly capturing and translating the developer's intent, such as alignment with human preferences, into a formal objective is practically impossible, making misspecification inevitable. Building on recent mathematical results, absent a mathematical characterization of these gaps, they are indistinguishable from those that collapse into Goodhart's law failure modes under strong optimization pressure. Because the Goodhart breaking point cannot be located ex ante, a principled limit on the optimization of General-Purpose AI systems is necessary. Absent such a limit, continued optimization is liable to push systems into predictable and irreversible loss of control.
format Preprint
id arxiv_https___arxiv_org_abs_2510_02840
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Maier, Antoine
Maier, Aude
David, Tom
Artificial Intelligence
A common but rarely examined assumption in machine learning is that training yields models that actually satisfy their specified objective function. We call this the Objective Satisfaction Assumption (OSA). Although deviations from OSA are acknowledged, their implications are overlooked. We argue, in a learning-paradigm-agnostic framework, that OSA fails in realistic conditions: approximation, estimation, and optimization errors guarantee systematic deviations from the intended objective, regardless of the quality of its specification. Beyond these technical limitations, perfectly capturing and translating the developer's intent, such as alignment with human preferences, into a formal objective is practically impossible, making misspecification inevitable. Building on recent mathematical results, absent a mathematical characterization of these gaps, they are indistinguishable from those that collapse into Goodhart's law failure modes under strong optimization pressure. Because the Goodhart breaking point cannot be located ex ante, a principled limit on the optimization of General-Purpose AI systems is necessary. Absent such a limit, continued optimization is liable to push systems into predictable and irreversible loss of control.
title Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
topic Artificial Intelligence
url https://arxiv.org/abs/2510.02840