Saved in:
Bibliografiske detaljer
Hovedforfatter: Author, Anonymous
Format: Recurso digital
Sprog:engelsk
Udgivet: Zenodo 2026
Fag:
Online adgang:https://doi.org/10.5281/zenodo.20018468
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!
Indholdsfortegnelse:
  • [Metadata anonymized 2026-05-18 for privacy/blind-review hygiene. Permanent deletion requested via Zenodo Support. Original creator credit retained in private deposit history.]<br><br><p>402-episode (67 problems × 3 seeds × 2 conditions) Docker ground-truth validation of WhyLab causal-audit C2 vs baseline on Gemini 2.5 Flash. Published as an honest null result — adaptive C2 did not outperform fixed C2 on the SWE-bench slice — to support calibration claims rather than overclaim. Companion to [redacted venue] WhyLab [redacted venue]. Hugging Face: <code>neogenesislab/whylab-gemini-2-5-docker-validation</code>.</p>