Uloženo v:
Podrobná bibliografie
Hlavní autor: Proof Engine
Médium: Recurso digital
Jazyk:
Vydáno: Zenodo 2026
Témata:
On-line přístup:https://doi.org/10.5281/zenodo.19489820
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Obsah:
  • <p>Automated fact-verification of the claim: "<em>AI hallucinations occur on fewer than 5% of factual questions</em>"</p> <p><strong>Verdict: DISPROVED</strong></p> <h3>Key Findings</h3> <ul> <li>OpenAI's o3 model hallucinated <strong>33% of the time</strong> on the PersonQA benchmark (B1) — nearly 7x the claimed ceiling of 5%.</li> <li>ChatGPT generates hallucinated content in approximately <strong>19.5% of its responses</strong> across general testing (B2) — nearly 4x the claimed ceiling.</li> <li>On the AA-Omniscience benchmark (6,000 factual questions across 42 topics), even the <strong>best-performing model hallucinates 22%</strong> of the time (B3).</li> <li>No major AI model achieves < 5% hallucination on open-ended factual question benchmarks. Sub-5% rates exist only on narrow grounded summarization tasks, not factual QA.</li> </ul> <h3>Files</h3> <ul> <li><strong>proof.py</strong> — Re-runnable Python verification script</li> <li><strong>proof.md</strong> — Structured proof report</li> <li><strong>proof_audit.md</strong> — Full verification audit trail</li> <li><strong>proof_narrative.md</strong> — Plain-language summary</li> <li><strong>proof.json</strong> — Machine-readable structured data</li> </ul> <p>Generated by <a href="https://github.com/yaniv-golan/proof-engine">Proof Engine</a> v1.1.0.</p>