Zhang, X., Ton, J., Shen, W., Wang, H., & Liu, Y. (2024). Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation.
Chicago Style (17th ed.) CitationZhang, Xiaoying, Jean-Francois Ton, Wei Shen, Hongning Wang, and Yang Liu. Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation. 2024.
MLA (9th ed.) CitationZhang, Xiaoying, et al. Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation. 2024.
Warning: These citations may not always be 100% accurate.