Kim, E., Gu, C., Tiwari, V., & Kolter, J. Z. (2026). Measuring Five-Nines Reliability: Sample-Efficient LLM Evaluation in Saturated Benchmarks.
Chicago Style (17th ed.) CitationKim, Eungyeup, Chenchen Gu, Vashisth Tiwari, and J. Zico Kolter. Measuring Five-Nines Reliability: Sample-Efficient LLM Evaluation in Saturated Benchmarks. 2026.
MLA (9th ed.) CitationKim, Eungyeup, et al. Measuring Five-Nines Reliability: Sample-Efficient LLM Evaluation in Saturated Benchmarks. 2026.
Warning: These citations may not always be 100% accurate.