Բովանդակություն:

Պահպանված է:

Մատենագիտական մանրամասներ
Հիմնական հեղինակ:	Sophia, Franny Philos
Ձևաչափ:	Recurso digital
Լեզու:
Հրապարակվել է:	Zenodo 2026
Խորագրեր:	AI evaluation benchmark measurement validity anthropomorphism NIST
Առցանց հասանելիություն:	https://doi.org/10.5281/zenodo.19145174
Ցուցիչներ:	Ավելացրեք ցուցիչ Չկան պիտակներ, Եղեք առաջինը, ով նշում է այս գրառումը!

Բովանդակություն:

<p>This document is a public comment submitted to the U.S. National Institute of Standards and Technology (NIST) in response to NIST AI 800-2 ipd, <em>Practices for Automated Benchmark Evaluations of Language Models</em> (January 2026, comment period closing March 31, 2026). The comment identifies a consequential gap in the draft: the absence of guidance on recognizing and mitigating anthropomorphic construct projection—the uncritical application of human-derived cognitive categories (e.g., "reasoning," "understanding," "knowledge") as measurement constructs for AI systems whose information-processing architectures bear no established correspondence to human cognition. The comment demonstrates how this gap manifests in four specific locations within the draft (Practice 1.1, Practice 1.2, the Glossary definition of "capability," and Practice 3.3) and proposes five concrete recommendations: (1) adding construct-applicability guidance to Practice 1.1, (2) recognizing anthropomorphic projection as a validity threat in Practice 1.2, (3) expanding Practice 3.3 to address ontological over-generalization, (4) adding "construct validity" to the Glossary with an explicit note on anthropomorphism, and (5) introducing a "Descriptive Neutrality" principle in Section 2.1.1. The analysis builds on and provides formal support for the framework developed in Sophia (2026), "The Anthropomorphic Trap" (Zenodo DOI: 10.5281/zenodo.18500433), and converges with the validity-centered approaches of Wallach et al. (2025) and Salaudeen et al. (2025), both already cited in the NIST draft.</p>

Նմանատիպ նյութեր