APA (7th ed.) Citation

Liu, S., Chen, X., Urcelay, B. M., & Croce, F. (2026). Preference Instability in Reward Models: Detection and Mitigation via Sparse Autoencoders.

Chicago Style (17th ed.) Citation

Liu, Shunchang, Xin Chen, Belen Martin Urcelay, and Francesco Croce. Preference Instability in Reward Models: Detection and Mitigation via Sparse Autoencoders. 2026.

MLA (9th ed.) Citation

Liu, Shunchang, et al. Preference Instability in Reward Models: Detection and Mitigation via Sparse Autoencoders. 2026.

Warning: These citations may not always be 100% accurate.