Feng, C., Shen, M., Balashankar, A., Gerner-Beuerle, C., & Rodrigues, M. R. D. (2026). Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges.
Chicago Style (17th ed.) CitationFeng, Chen, Minghe Shen, Ananth Balashankar, Carsten Gerner-Beuerle, and Miguel R. D. Rodrigues. Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges. 2026.
MLA (9th ed.) CitationFeng, Chen, et al. Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges. 2026.
Warning: These citations may not always be 100% accurate.