Saved in:
| Main Authors: | Wang, Hao, Li, Hao, Huang, Minlie, Sha, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.16006 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
by: Wang, Xinyuan, et al.
Published: (2024)
by: Wang, Xinyuan, et al.
Published: (2024)
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation
by: Kim, Minkyoung, et al.
Published: (2024)
by: Kim, Minkyoung, et al.
Published: (2024)
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
by: Zhang, Zhexin, et al.
Published: (2023)
by: Zhang, Zhexin, et al.
Published: (2023)
Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?
by: Mu, Junjie, et al.
Published: (2025)
by: Mu, Junjie, et al.
Published: (2025)
ShieldLearner: A New Paradigm for Jailbreak Attack Defense in LLMs
by: Ni, Ziyi, et al.
Published: (2025)
by: Ni, Ziyi, et al.
Published: (2025)
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
by: Xu, Zhao, et al.
Published: (2024)
by: Xu, Zhao, et al.
Published: (2024)
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
by: Chen, Yunhao, et al.
Published: (2025)
by: Chen, Yunhao, et al.
Published: (2025)
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
by: Liu, Fan, et al.
Published: (2024)
by: Liu, Fan, et al.
Published: (2024)
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
by: Li, Ran, et al.
Published: (2025)
by: Li, Ran, et al.
Published: (2025)
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
by: Zhang, Zhexin, et al.
Published: (2024)
by: Zhang, Zhexin, et al.
Published: (2024)
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
by: Meng, Wenlong, et al.
Published: (2025)
by: Meng, Wenlong, et al.
Published: (2025)
From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks
by: Zhang, Zhexin, et al.
Published: (2024)
by: Zhang, Zhexin, et al.
Published: (2024)
HSF: Defending against Jailbreak Attacks with Hidden State Filtering
by: Qian, Cheng, et al.
Published: (2024)
by: Qian, Cheng, et al.
Published: (2024)
Harnessing the Plug-and-Play Controller by Prompting
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers
by: Lin, Liang, et al.
Published: (2025)
by: Lin, Liang, et al.
Published: (2025)
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
by: Liao, Zeyi, et al.
Published: (2024)
by: Liao, Zeyi, et al.
Published: (2024)
AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts
by: Kumar, Vishal, et al.
Published: (2024)
by: Kumar, Vishal, et al.
Published: (2024)
Activation-Guided Local Editing for Jailbreaking Attacks
by: Wang, Jiecong, et al.
Published: (2025)
by: Wang, Jiecong, et al.
Published: (2025)
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
by: Wang, Zijun, et al.
Published: (2024)
by: Wang, Zijun, et al.
Published: (2024)
Defending LLMs against Jailbreaking Attacks via Backtranslation
by: Wang, Yihan, et al.
Published: (2024)
by: Wang, Yihan, et al.
Published: (2024)
Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks
by: Noughabi, Havva Alizadeh, et al.
Published: (2025)
by: Noughabi, Havva Alizadeh, et al.
Published: (2025)
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
by: Zhang, Zhexin, et al.
Published: (2025)
by: Zhang, Zhexin, et al.
Published: (2025)
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis
by: Wong, Aidan, et al.
Published: (2024)
by: Wong, Aidan, et al.
Published: (2024)
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
by: Chen, Kexin, et al.
Published: (2024)
by: Chen, Kexin, et al.
Published: (2024)
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
by: Luo, Xuan, et al.
Published: (2025)
by: Luo, Xuan, et al.
Published: (2025)
Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding
by: Xiao, Zhongyu, et al.
Published: (2026)
by: Xiao, Zhongyu, et al.
Published: (2026)
Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)
by: Chu, Junjie, et al.
Published: (2024)
Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
by: Lin, Yuping, et al.
Published: (2024)
by: Lin, Yuping, et al.
Published: (2024)
Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs
by: Zhou, Yao, et al.
Published: (2026)
by: Zhou, Yao, et al.
Published: (2026)
DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification
by: Li, Yu, et al.
Published: (2025)
by: Li, Yu, et al.
Published: (2025)
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
by: Li, Qizhang, et al.
Published: (2024)
by: Li, Qizhang, et al.
Published: (2024)
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
by: Guo, Xingang, et al.
Published: (2024)
by: Guo, Xingang, et al.
Published: (2024)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
by: Huang, Xijie, et al.
Published: (2024)
by: Huang, Xijie, et al.
Published: (2024)
Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
MiniPLM: Knowledge Distillation for Pre-Training Language Models
by: Gu, Yuxian, et al.
Published: (2024)
by: Gu, Yuxian, et al.
Published: (2024)
Language Models Hallucinate, but May Excel at Fact Verification
by: Guan, Jian, et al.
Published: (2023)
by: Guan, Jian, et al.
Published: (2023)
Large Language Models Are Not Robust Multiple Choice Selectors
by: Zheng, Chujie, et al.
Published: (2023)
by: Zheng, Chujie, et al.
Published: (2023)
Similar Items
-
BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
by: Wang, Xinyuan, et al.
Published: (2024) -
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
by: Wang, Hao, et al.
Published: (2024) -
Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation
by: Kim, Minkyoung, et al.
Published: (2024) -
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
by: Zhang, Zhexin, et al.
Published: (2023) -
Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?
by: Mu, Junjie, et al.
Published: (2025)