:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ramesh, Govind, Dou, Yao, Xu, Wei
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2405.13077
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Low-Resource Languages Jailbreak GPT-4
by: Yong, Zheng-Xin, et al.
Published: (2023)

Defending against Jailbreak through Early Exit Generation of Large Language Models
by: Zhao, Chongwen, et al.
Published: (2024)

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
by: Huang, Yao, et al.
Published: (2025)

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models
by: Lv, Huijie, et al.
Published: (2024)

SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
by: Dong, Xiaoning, et al.
Published: (2024)

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
by: Xu, Zhao, et al.
Published: (2024)

Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
by: Liu, Fan, et al.
Published: (2024)

Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
by: Ouyang, Yang, et al.
Published: (2025)

Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)

QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language
by: Zou, Qingsong, et al.
Published: (2025)

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
by: Ran, Delong, et al.
Published: (2024)

Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
by: Zhao, Wei, et al.
Published: (2024)

How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation
by: Long, Zhuohang, et al.
Published: (2025)

SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters
by: Yang, Yan, et al.
Published: (2024)

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
by: Zhao, Shiji, et al.
Published: (2025)

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning
by: Sel, Bilgehan, et al.
Published: (2026)

LLM Jailbreak Detection for (Almost) Free!
by: Chen, Guorui, et al.
Published: (2025)

MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
by: Wang, Fengxiang, et al.
Published: (2024)

Using Hallucinations to Bypass GPT4's Filter
by: Lemkin, Benjamin
Published: (2024)

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
by: Murphy, Brendan, et al.
Published: (2025)

ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
by: Cheng, Siyang, et al.
Published: (2025)

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
by: Xu, Zhangchen, et al.
Published: (2024)

Continuous Embedding Attacks via Clipped Inputs in Jailbreaking Large Language Models
by: Xu, Zihao, et al.
Published: (2024)

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
by: Fang, Zhicheng, et al.
Published: (2026)

Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)

Large Reasoning Models Are Autonomous Jailbreak Agents
by: Hagendorff, Thilo, et al.
Published: (2025)

Activation-Guided Local Editing for Jailbreaking Attacks
by: Wang, Jiecong, et al.
Published: (2025)

Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs
by: Xing, Wenpeng, et al.
Published: (2025)

Distract Large Language Models for Automatic Jailbreak Attack
by: Xiao, Zeguan, et al.
Published: (2024)

Geneshift: Impact of different scenario shift on Jailbreaking LLM
by: Wu, Tianyi, et al.
Published: (2025)

Jailbreaking Large Language Models Through Content Concretization
by: Wahréus, Johan, et al.
Published: (2025)

Jailbreaking Frontier Foundation Models Through Intention Deception
by: Wang, Xinhe, et al.
Published: (2026)

Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
by: Das, Saswat, et al.
Published: (2025)

Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs
by: Pu, Rui, et al.
Published: (2024)

Is the System Message Really Important to Jailbreaks in Large Language Models?
by: Zou, Xiaotian, et al.
Published: (2024)

Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
by: Gibbs, Tom, et al.
Published: (2024)

LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024)

Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
by: Shen, Guangyu, et al.
Published: (2024)

Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)

bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
by: Ji, Wence, et al.
Published: (2025)