Obsah: :: Library Catalog

Uloženo v:

Podrobná bibliografie
Hlavní autor:	Proof Engine
Médium:	Recurso digital
Jazyk:
Vydáno:	Zenodo 2026
Témata:	proof-engine fact-checking automated-verification
On-line přístup:	https://doi.org/10.5281/zenodo.19489820
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Obsah:

Automated fact-verification of the claim: "AI hallucinations occur on fewer than 5% of factual questions" Verdict: DISPROVED <h3>Key Findings</h3> <ul> <li>OpenAI's o3 model hallucinated 33% of the time on the PersonQA benchmark (B1) — nearly 7x the claimed ceiling of 5%.</li> <li>ChatGPT generates hallucinated content in approximately 19.5% of its responses across general testing (B2) — nearly 4x the claimed ceiling.</li> <li>On the AA-Omniscience benchmark (6,000 factual questions across 42 topics), even the best-performing model hallucinates 22% of the time (B3).</li> <li>No major AI model achieves < 5% hallucination on open-ended factual question benchmarks. Sub-5% rates exist only on narrow grounded summarization tasks, not factual QA.</li> </ul> <h3>Files</h3> <ul> <li>proof.py — Re-runnable Python verification script</li> <li>proof.md — Structured proof report</li> <li>proof_audit.md — Full verification audit trail</li> <li>proof_narrative.md — Plain-language summary</li> <li>proof.json — Machine-readable structured data</li> </ul> Generated by <a href="https://github.com/yaniv-golan/proof-engine">Proof Engine</a> v1.1.0.

Podobné jednotky