Zhang, S., Shi, W., Li, S., Liao, J., Cai, H., & Wang, X. (2025). Interpretable Reward Model via Sparse Autoencoder.
Chicago Style (17th ed.) CitationZhang, Shuyi, Wei Shi, Sihang Li, Jiayi Liao, Hengxing Cai, and Xiang Wang. Interpretable Reward Model via Sparse Autoencoder. 2025.
MLA (9th ed.) CitationZhang, Shuyi, et al. Interpretable Reward Model via Sparse Autoencoder. 2025.
Warning: These citations may not always be 100% accurate.