Enregistré dans:
Détails bibliographiques
Auteurs principaux: Li, Hongyu, Liu, Kuan, Chen, Yuan, Hu, Juntao, Lu, Huimin, Chen, Guanjie, Liu, Xue, Lu, Guangming, Huang, Hong
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2603.00166
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866911668737409024
author Li, Hongyu
Liu, Kuan
Chen, Yuan
Hu, Juntao
Lu, Huimin
Chen, Guanjie
Liu, Xue
Lu, Guangming
Huang, Hong
author_facet Li, Hongyu
Liu, Kuan
Chen, Yuan
Hu, Juntao
Lu, Huimin
Chen, Guanjie
Liu, Xue
Lu, Guangming
Huang, Hong
contents Recent advances in generative AI have shown human-level performance in complex content creation. However, we identify a "Paradox of Simplicity": models that can render complex scenes often fail at trivial, low-entropy tasks, such as generating a uniform pure color image. We argue this is a systemic failure related to uncontrollable emergent abilities. As models scale, strong priors for aesthetics and complexity override deterministic simplicity, creating an "aesthetic bias" that hinders the model's transition from data simulation to true intellectual abstraction. To better investigate this problem, we formalize the concept of AI Obedience, a hierarchical framework that grades a model's ability to transition from probabilistic approximation to pixel-level determinism (Levels 1 to 5). We introduce Violin, the first systematic benchmark designed to evaluate Level 4 Obedience through three deterministic tasks: color purity, image masking, and geometric shape generation. Using Violin, we evaluate several state-of-the-art models and reveal that closed-source models generally outperform open-source ones in deterministic precision. Interestingly, performance on our benchmark correlates with the benchmark in natural image generation. Our work provides a foundational framework and tools for achieving better alignment between human instructions and model outputs.
format Preprint
id arxiv_https___arxiv_org_abs_2603_00166
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Exploring the AI Obedience: Why is Generating a Pure Color Image Harder than CyberPunk?
Li, Hongyu
Liu, Kuan
Chen, Yuan
Hu, Juntao
Lu, Huimin
Chen, Guanjie
Liu, Xue
Lu, Guangming
Huang, Hong
Computer Vision and Pattern Recognition
Artificial Intelligence
Recent advances in generative AI have shown human-level performance in complex content creation. However, we identify a "Paradox of Simplicity": models that can render complex scenes often fail at trivial, low-entropy tasks, such as generating a uniform pure color image. We argue this is a systemic failure related to uncontrollable emergent abilities. As models scale, strong priors for aesthetics and complexity override deterministic simplicity, creating an "aesthetic bias" that hinders the model's transition from data simulation to true intellectual abstraction. To better investigate this problem, we formalize the concept of AI Obedience, a hierarchical framework that grades a model's ability to transition from probabilistic approximation to pixel-level determinism (Levels 1 to 5). We introduce Violin, the first systematic benchmark designed to evaluate Level 4 Obedience through three deterministic tasks: color purity, image masking, and geometric shape generation. Using Violin, we evaluate several state-of-the-art models and reveal that closed-source models generally outperform open-source ones in deterministic precision. Interestingly, performance on our benchmark correlates with the benchmark in natural image generation. Our work provides a foundational framework and tools for achieving better alignment between human instructions and model outputs.
title Exploring the AI Obedience: Why is Generating a Pure Color Image Harder than CyberPunk?
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2603.00166