Saved in:
| Main Authors: | Ramesh, Govind, Dou, Yao, Xu, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.13077 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Low-Resource Languages Jailbreak GPT-4
by: Yong, Zheng-Xin, et al.
Published: (2023)
by: Yong, Zheng-Xin, et al.
Published: (2023)
Defending against Jailbreak through Early Exit Generation of Large Language Models
by: Zhao, Chongwen, et al.
Published: (2024)
by: Zhao, Chongwen, et al.
Published: (2024)
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
by: Huang, Yao, et al.
Published: (2025)
by: Huang, Yao, et al.
Published: (2025)
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models
by: Lv, Huijie, et al.
Published: (2024)
by: Lv, Huijie, et al.
Published: (2024)
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
by: Dong, Xiaoning, et al.
Published: (2024)
by: Dong, Xiaoning, et al.
Published: (2024)
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
by: Xu, Zhao, et al.
Published: (2024)
by: Xu, Zhao, et al.
Published: (2024)
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
by: Liu, Fan, et al.
Published: (2024)
by: Liu, Fan, et al.
Published: (2024)
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
by: Ouyang, Yang, et al.
Published: (2025)
by: Ouyang, Yang, et al.
Published: (2025)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language
by: Zou, Qingsong, et al.
Published: (2025)
by: Zou, Qingsong, et al.
Published: (2025)
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
by: Ran, Delong, et al.
Published: (2024)
by: Ran, Delong, et al.
Published: (2024)
Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
by: Zhao, Wei, et al.
Published: (2024)
by: Zhao, Wei, et al.
Published: (2024)
How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation
by: Long, Zhuohang, et al.
Published: (2025)
by: Long, Zhuohang, et al.
Published: (2025)
SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters
by: Yang, Yan, et al.
Published: (2024)
by: Yang, Yan, et al.
Published: (2024)
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
by: Zhao, Shiji, et al.
Published: (2025)
by: Zhao, Shiji, et al.
Published: (2025)
Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning
by: Sel, Bilgehan, et al.
Published: (2026)
by: Sel, Bilgehan, et al.
Published: (2026)
LLM Jailbreak Detection for (Almost) Free!
by: Chen, Guorui, et al.
Published: (2025)
by: Chen, Guorui, et al.
Published: (2025)
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
by: Wang, Fengxiang, et al.
Published: (2024)
by: Wang, Fengxiang, et al.
Published: (2024)
Using Hallucinations to Bypass GPT4's Filter
by: Lemkin, Benjamin
Published: (2024)
by: Lemkin, Benjamin
Published: (2024)
Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
by: Murphy, Brendan, et al.
Published: (2025)
by: Murphy, Brendan, et al.
Published: (2025)
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
by: Cheng, Siyang, et al.
Published: (2025)
by: Cheng, Siyang, et al.
Published: (2025)
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
by: Xu, Zhangchen, et al.
Published: (2024)
by: Xu, Zhangchen, et al.
Published: (2024)
Continuous Embedding Attacks via Clipped Inputs in Jailbreaking Large Language Models
by: Xu, Zihao, et al.
Published: (2024)
by: Xu, Zihao, et al.
Published: (2024)
Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
by: Fang, Zhicheng, et al.
Published: (2026)
by: Fang, Zhicheng, et al.
Published: (2026)
Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)
by: Gao, Kuofeng, et al.
Published: (2025)
Large Reasoning Models Are Autonomous Jailbreak Agents
by: Hagendorff, Thilo, et al.
Published: (2025)
by: Hagendorff, Thilo, et al.
Published: (2025)
Activation-Guided Local Editing for Jailbreaking Attacks
by: Wang, Jiecong, et al.
Published: (2025)
by: Wang, Jiecong, et al.
Published: (2025)
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs
by: Xing, Wenpeng, et al.
Published: (2025)
by: Xing, Wenpeng, et al.
Published: (2025)
Distract Large Language Models for Automatic Jailbreak Attack
by: Xiao, Zeguan, et al.
Published: (2024)
by: Xiao, Zeguan, et al.
Published: (2024)
Geneshift: Impact of different scenario shift on Jailbreaking LLM
by: Wu, Tianyi, et al.
Published: (2025)
by: Wu, Tianyi, et al.
Published: (2025)
Jailbreaking Large Language Models Through Content Concretization
by: Wahréus, Johan, et al.
Published: (2025)
by: Wahréus, Johan, et al.
Published: (2025)
Jailbreaking Frontier Foundation Models Through Intention Deception
by: Wang, Xinhe, et al.
Published: (2026)
by: Wang, Xinhe, et al.
Published: (2026)
Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
by: Das, Saswat, et al.
Published: (2025)
by: Das, Saswat, et al.
Published: (2025)
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs
by: Pu, Rui, et al.
Published: (2024)
by: Pu, Rui, et al.
Published: (2024)
Is the System Message Really Important to Jailbreaks in Large Language Models?
by: Zou, Xiaotian, et al.
Published: (2024)
by: Zou, Xiaotian, et al.
Published: (2024)
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
by: Gibbs, Tom, et al.
Published: (2024)
by: Gibbs, Tom, et al.
Published: (2024)
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024)
by: Yu, Miao, et al.
Published: (2024)
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
by: Shen, Guangyu, et al.
Published: (2024)
by: Shen, Guangyu, et al.
Published: (2024)
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)
bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
by: Ji, Wence, et al.
Published: (2025)
by: Ji, Wence, et al.
Published: (2025)
Similar Items
-
Low-Resource Languages Jailbreak GPT-4
by: Yong, Zheng-Xin, et al.
Published: (2023) -
Defending against Jailbreak through Early Exit Generation of Large Language Models
by: Zhao, Chongwen, et al.
Published: (2024) -
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
by: Huang, Yao, et al.
Published: (2025) -
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models
by: Lv, Huijie, et al.
Published: (2024) -
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
by: Dong, Xiaoning, et al.
Published: (2024)