Saved in:
| Hovedforfatter: | |
|---|---|
| Format: | Recurso digital |
| Sprog: | engelsk |
| Udgivet: |
Zenodo
2026
|
| Fag: | |
| Online adgang: | https://doi.org/10.5281/zenodo.20018468 |
| Tags: |
Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!
|
Indholdsfortegnelse:
- [Metadata anonymized 2026-05-18 for privacy/blind-review hygiene. Permanent deletion requested via Zenodo Support. Original creator credit retained in private deposit history.]<br><br><p>402-episode (67 problems × 3 seeds × 2 conditions) Docker ground-truth validation of WhyLab causal-audit C2 vs baseline on Gemini 2.5 Flash. Published as an honest null result — adaptive C2 did not outperform fixed C2 on the SWE-bench slice — to support calibration claims rather than overclaim. Companion to [redacted venue] WhyLab [redacted venue]. Hugging Face: <code>neogenesislab/whylab-gemini-2-5-docker-validation</code>.</p>