Tabla de Contenidos: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autor principal:	Rodabaugh, Alexander
Formato:	Recurso digital
Lenguaje:	inglés
Publicado:	Zenodo 2026
Materias:	cognometric transport embedding-space transport label-free probes Procrustes refusal axis LLM alignment auditing construct validity preregistration styxx
Acceso en línea:	https://doi.org/10.5281/zenodo.20278945
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Tabla de Contenidos:

Label-free cognometric transport (a linear Procrustes map fit between embedding spaces from an unlabeled corpus, used to score a behavioral axis in a foreign space) is widely useful but unevenly reliable across embedding families. We report a single, bounded empirical regularity: across 17 size-controlled label-free corpora spanning two studies (n=5 + n=12), four OpenAI target models, two foreign embedding spaces, and 75 evaluation prompts, **cross-family transport quality is governed by a measurable threshold in corpus↔domain overlap** (mean-max cosine to the eval prompts in the home space). Below overlap ≈ 0.31, cross-family transported AUC collapses to ~0.69; at and above the threshold, cross-family AUC clears 0.80 and tracks same-family. Same-family transport is essentially insensitive to overlap. A high-resolution 12-level replication recovered the cross-family threshold but **failed the preregistered same-family control criterion** (Spearman −0.41, limit ±0.40); we report this as a real bound on the claim, not a footnote. A separate cross-vendor preregistered test killed any universality reading: the same `mpnet × corpus_2` cell that is worst for OpenAI is worst for Anthropic (min transported 0.617), showing the residual failure mode lives at the corpus/foreign-space boundary, not at the vendor boundary. The claim is therefore narrow: the threshold is a property of the **corpus ↔ foreign-space pairing**, vendor- agnostic in that sense, and only validated same-family. It is not a universal AI-integrity result.Repository: <a href="https://github.com/fathom-lab/styxx">fathom-lab/styxx</a> @ <code>58a1d98</code> (PyPI <code>styxx==7.4.1</code>).Bundle contents: paper, figure, raw run JSON (<code>out_corpus_coverage_law.json</code>, <code>out_corpus_coverage_law_fine.json</code>, <code>out_cross_vendor_refusal_transport_confirm.json</code>), scripts (<code>corpus_coverage_law.py</code>, <code>corpus_coverage_law_fine.py</code>, <code>cross_vendor_refusal_transport_confirm.py</code>, <code>plot_threshold_law.py</code>), the related papers establishing the audit chain (corpus-coverage law original + fine replication, cross-vendor stress, cross-vendor preregistration-killed confirmation, refusal-transport stress boundary, styxx status consolidation map, research integrity protocol), and the styxx-on-paper self-audit (<code>threshold-law-self-audit-2026-05-18.md</code>) — the paper scored by the very instruments it documents. Self-audit verdict: 0 cracks requiring revision; all 8 headline numbers match raw JSON within 0.005; integrity protocol rules visibly followed; construct-ceiling firings on the limits/integrity sections are register artifacts predicted by the consolidation map.Honest bounds (also in the paper): the strict preregistered same-family flat-control criterion failed in the high-resolution 12-point replication (Spearman -0.41 vs +/-0.40 limit); an independent cross-vendor preregistration was killed (min Anthropic transported AUC 0.617 below the 0.70 floor). Both are reported in the paper body, not in footnotes. This is a Zenodo methods deposit, not peer-reviewed, no arXiv endorsement claimed, no universality claimed. Lineage: this deposit is methodologically downstream of the Fathom working-paper series (<a href="https://doi.org/10.5281/zenodo.19609853">10.5281/zenodo.19609853</a>, <a href="https://doi.org/10.5281/zenodo.19502710">10.5281/zenodo.19502710</a>, <a href="https://doi.org/10.5281/zenodo.19468271">10.5281/zenodo.19468271</a>) and is a supplement to the <code>styxx</code> tool (<a href="https://github.com/fathom-lab/styxx">repo</a>, <a href="https://pypi.org/project/styxx/7.4.1/">PyPI 7.4.1</a>, commit <code>58a1d98</code>). It is not a continuation of the Fathom depth/geometry line; it is a separate, narrower empirical finding about label-free cognometric transport, audited by the same research-integrity protocol. Lineage: this deposit supplements <a href="https://doi.org/10.5281/zenodo.20130041">Fathom v23 / styxx v7.2.0</a> and is methodologically downstream of the Fathom Cognometric series (<a href="https://doi.org/10.5281/zenodo.19777921">Every Mind Leaves Vitals</a>, <a href="https://doi.org/10.5281/zenodo.19758619">styxx v6.2.0 ref impl</a>, <a href="https://doi.org/10.5281/zenodo.19502716">Fathom Cognitive Atlas v0.3</a>). It is a supplement to the <code>styxx</code> tool (<a href="https://github.com/fathom-lab/styxx">repo</a>, <a href="https://pypi.org/project/styxx/7.4.1/">PyPI 7.4.1</a>, commit <code>58a1d98</code>). It is not a continuation of the depth/geometry line; it is a narrower empirical finding about label-free cognometric transport, audited by the same research-integrity protocol.

Ejemplares similares