Góral, G., Winkels, M., & Basart, S. (2025). Depth-Wise Activation Steering for Honest Language Models.
Chicago Style (17th ed.) CitationGóral, Gracjan, Marysia Winkels, and Steven Basart. Depth-Wise Activation Steering for Honest Language Models. 2025.
MLA (9th ed.) CitationGóral, Gracjan, et al. Depth-Wise Activation Steering for Honest Language Models. 2025.
Warning: These citations may not always be 100% accurate.