Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Maier, Antoine, Maier, Aude, David, Tom
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.02840
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917262014808064
author	Maier, Antoine Maier, Aude David, Tom
author_facet	Maier, Antoine Maier, Aude David, Tom
contents	A common but rarely examined assumption in machine learning is that training yields models that actually satisfy their specified objective function. We call this the Objective Satisfaction Assumption (OSA). Although deviations from OSA are acknowledged, their implications are overlooked. We argue, in a learning-paradigm-agnostic framework, that OSA fails in realistic conditions: approximation, estimation, and optimization errors guarantee systematic deviations from the intended objective, regardless of the quality of its specification. Beyond these technical limitations, perfectly capturing and translating the developer's intent, such as alignment with human preferences, into a formal objective is practically impossible, making misspecification inevitable. Building on recent mathematical results, absent a mathematical characterization of these gaps, they are indistinguishable from those that collapse into Goodhart's law failure modes under strong optimization pressure. Because the Goodhart breaking point cannot be located ex ante, a principled limit on the optimization of General-Purpose AI systems is necessary. Absent such a limit, continued optimization is liable to push systems into predictable and irreversible loss of control.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_02840
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization Maier, Antoine Maier, Aude David, Tom Artificial Intelligence A common but rarely examined assumption in machine learning is that training yields models that actually satisfy their specified objective function. We call this the Objective Satisfaction Assumption (OSA). Although deviations from OSA are acknowledged, their implications are overlooked. We argue, in a learning-paradigm-agnostic framework, that OSA fails in realistic conditions: approximation, estimation, and optimization errors guarantee systematic deviations from the intended objective, regardless of the quality of its specification. Beyond these technical limitations, perfectly capturing and translating the developer's intent, such as alignment with human preferences, into a formal objective is practically impossible, making misspecification inevitable. Building on recent mathematical results, absent a mathematical characterization of these gaps, they are indistinguishable from those that collapse into Goodhart's law failure modes under strong optimization pressure. Because the Goodhart breaking point cannot be located ex ante, a principled limit on the optimization of General-Purpose AI systems is necessary. Absent such a limit, continued optimization is liable to push systems into predictable and irreversible loss of control.
title	Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
topic	Artificial Intelligence
url	https://arxiv.org/abs/2510.02840

Similar Items