Farzam, A., Behabahani, M., Malek, M., Nevmyvaka, Y., & Sapiro, G. (2026). Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglement.
Chicago Style (17th ed.) CitationFarzam, Amirhossein, Majid Behabahani, Mani Malek, Yuriy Nevmyvaka, and Guillermo Sapiro. Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglement. 2026.
MLA (9th ed.) CitationFarzam, Amirhossein, et al. Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglement. 2026.
Warning: These citations may not always be 100% accurate.