Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bao, Han, Huang, Yue, Wang, Xiaoda, Zhang, Zheyuan, Zhou, Yujun, Yang, Carl, Zhang, Xiangliang, Ye, Yanfang
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.20042
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

General Alignment has improved average-case helpfulness and safety, but current alignment practice still rewards confident, single-turn responses. The problem is not only that models fail on edge cases; it is that current evaluation makes many of these failures hard to see. We take the position that alignment must move beyond average-case evaluation by making failures under value conflict, plural stakeholder disagreement, and epistemic ambiguity visible and actionable. Scalar rewards compress diverse values into a single number; data and evaluation regimes collapse, filter, or fail to elicit the cases where alignment is hardest; and governance often lacks mechanisms for adjudicating contested cases. These blind spots produce value flattening, representation loss, and uncertainty blindness. We use Edge alignment to name a detection, evaluation, and governance agenda for surfacing these failures and connecting them to appropriate interventions. Rather than a single training objective, Edge alignment defines the conditions under which standard alignment should yield to mechanisms that preserve multidimensional value structure, represent plural perspectives, and support uncertainty-aware interaction. A pilot diagnostic set of 91 edge cases and four contemporary models illustrates that ordinary helpfulness and safety readings can miss process failures that edge-aware evaluation exposes. We outline operational edge signals, process-aware evaluation criteria, and a three-phase process stack that reframes alignment as a lifecycle problem of dynamic normative governance.

Similar Items