Saved in:
| Main Authors: | Wu, Zihui, Gao, Haichang, He, Jianping, Wang, Ping |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.17915 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
by: Lin, Zheng, et al.
Published: (2026)
by: Lin, Zheng, et al.
Published: (2026)
Re-Triggering Safeguards within LLMs for Jailbreak Detection
by: Lin, Zheng, et al.
Published: (2026)
by: Lin, Zheng, et al.
Published: (2026)
SoK: Evaluating Jailbreak Guardrails for Large Language Models
by: Wang, Xunguang, et al.
Published: (2025)
by: Wang, Xunguang, et al.
Published: (2025)
Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)
by: Gao, Kuofeng, et al.
Published: (2025)
Emoji-Based Jailbreaking of Large Language Models
by: Gopinadh, M P V S, et al.
Published: (2026)
by: Gopinadh, M P V S, et al.
Published: (2026)
SoK: Robustness in Large Language Models against Jailbreak Attacks
by: Xu, Feiyue, et al.
Published: (2026)
by: Xu, Feiyue, et al.
Published: (2026)
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
by: Ran, Delong, et al.
Published: (2024)
by: Ran, Delong, et al.
Published: (2024)
Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models
by: Das, Badhan Chandra, et al.
Published: (2026)
by: Das, Badhan Chandra, et al.
Published: (2026)
NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models
by: Zhang, Chuhan, et al.
Published: (2025)
by: Zhang, Chuhan, et al.
Published: (2025)
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing
by: Wang, Yi, et al.
Published: (2025)
by: Wang, Yi, et al.
Published: (2025)
Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning
by: Wang, Zhaoqi, et al.
Published: (2026)
by: Wang, Zhaoqi, et al.
Published: (2026)
A Cross-Language Investigation into Jailbreak Attacks in Large Language Models
by: Li, Jie, et al.
Published: (2024)
by: Li, Jie, et al.
Published: (2024)
Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads
by: Wu, Jinman, et al.
Published: (2026)
by: Wu, Jinman, et al.
Published: (2026)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
Behind the Mask: Benchmarking Camouflaged Jailbreaks in Large Language Models
by: Zheng, Youjia, et al.
Published: (2025)
by: Zheng, Youjia, et al.
Published: (2025)
ShallowJail: Steering Jailbreaks against Large Language Models
by: Liu, Shang, et al.
Published: (2026)
by: Liu, Shang, et al.
Published: (2026)
Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models
by: Xiong, Chen, et al.
Published: (2026)
by: Xiong, Chen, et al.
Published: (2026)
Jailbreaking and Mitigation of Vulnerabilities in Large Language Models
by: Peng, Benji, et al.
Published: (2024)
by: Peng, Benji, et al.
Published: (2024)
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
by: Wang, Youze, et al.
Published: (2025)
by: Wang, Youze, et al.
Published: (2025)
PiCo: Jailbreaking Multimodal Large Language Models via Pictorial Code Contextualization
by: Liu, Aofan, et al.
Published: (2025)
by: Liu, Aofan, et al.
Published: (2025)
Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models
by: Hong, Wenjing, et al.
Published: (2026)
by: Hong, Wenjing, et al.
Published: (2026)
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling
by: Zhang, Deyue, et al.
Published: (2025)
by: Zhang, Deyue, et al.
Published: (2025)
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models
by: Xu, Zihao, et al.
Published: (2024)
by: Xu, Zihao, et al.
Published: (2024)
Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models
by: Yang, Fan
Published: (2025)
by: Yang, Fan
Published: (2025)
Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations
by: Wong, Ryan, et al.
Published: (2025)
by: Wong, Ryan, et al.
Published: (2025)
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
by: Zhang, Wenhui, et al.
Published: (2025)
by: Zhang, Wenhui, et al.
Published: (2025)
STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models
by: Wang, Xunguang, et al.
Published: (2025)
by: Wang, Xunguang, et al.
Published: (2025)
Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles
by: Wang, Zhilong, et al.
Published: (2024)
by: Wang, Zhilong, et al.
Published: (2024)
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
by: Cheng, Siyang, et al.
Published: (2025)
by: Cheng, Siyang, et al.
Published: (2025)
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
by: Chen, Bocheng, et al.
Published: (2024)
by: Chen, Bocheng, et al.
Published: (2024)
Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models
by: Yu, Yongcan, et al.
Published: (2025)
by: Yu, Yongcan, et al.
Published: (2025)
QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language
by: Zou, Qingsong, et al.
Published: (2025)
by: Zou, Qingsong, et al.
Published: (2025)
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
by: Teng, Ma, et al.
Published: (2024)
by: Teng, Ma, et al.
Published: (2024)
Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
by: Park, Junyoung, et al.
Published: (2026)
by: Park, Junyoung, et al.
Published: (2026)
AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks
by: Song, Weiming, et al.
Published: (2026)
by: Song, Weiming, et al.
Published: (2026)
Prefill-level Jailbreak: A Black-Box Risk Analysis of Large Language Models
by: Li, Yakai, et al.
Published: (2025)
by: Li, Yakai, et al.
Published: (2025)
A Systematic Security Evaluation of OpenClaw and Its Variants
by: Wang, Yuhang, et al.
Published: (2026)
by: Wang, Yuhang, et al.
Published: (2026)
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024)
by: Yu, Miao, et al.
Published: (2024)
Hidden You Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Logic Chain Injection
by: Wang, Zhilong, et al.
Published: (2024)
by: Wang, Zhilong, et al.
Published: (2024)
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
by: Liu, Yanjiang, et al.
Published: (2025)
by: Liu, Yanjiang, et al.
Published: (2025)
Similar Items
-
Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
by: Lin, Zheng, et al.
Published: (2026) -
Re-Triggering Safeguards within LLMs for Jailbreak Detection
by: Lin, Zheng, et al.
Published: (2026) -
SoK: Evaluating Jailbreak Guardrails for Large Language Models
by: Wang, Xunguang, et al.
Published: (2025) -
Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025) -
Emoji-Based Jailbreaking of Large Language Models
by: Gopinadh, M P V S, et al.
Published: (2026)