Chen, Y., Ma, Y., Huang, X., Zhang, S., Chen, H., Wang, H., & Qi, G. (2026). StressEval: Failure-Driven Dynamic Benchmarking for Knowledge-Intensive Reasoning in Large Language Models.
Chicago Style (17th ed.) CitationChen, Yongrui, Yangyang Ma, Xiaoying Huang, Shenyu Zhang, Huajun Chen, Haofen Wang, and Guilin Qi. StressEval: Failure-Driven Dynamic Benchmarking for Knowledge-Intensive Reasoning in Large Language Models. 2026.
MLA (9th ed.) CitationChen, Yongrui, et al. StressEval: Failure-Driven Dynamic Benchmarking for Knowledge-Intensive Reasoning in Large Language Models. 2026.
Warning: These citations may not always be 100% accurate.