Saved in:
| Main Authors: | Ahn, Yelim, Lee, Jaejin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.01306 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity
by: Husain, Bilal Saleh
Published: (2025)
by: Husain, Bilal Saleh
Published: (2025)
Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion
by: Cui, Tiehan, et al.
Published: (2025)
by: Cui, Tiehan, et al.
Published: (2025)
FlipAttack: Jailbreak LLMs via Flipping
by: Liu, Yue, et al.
Published: (2024)
by: Liu, Yue, et al.
Published: (2024)
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
by: Jaiswal, Piyush, et al.
Published: (2026)
by: Jaiswal, Piyush, et al.
Published: (2026)
Re-Triggering Safeguards within LLMs for Jailbreak Detection
by: Lin, Zheng, et al.
Published: (2026)
by: Lin, Zheng, et al.
Published: (2026)
Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses
by: Shang, Zhengchun, et al.
Published: (2025)
by: Shang, Zhengchun, et al.
Published: (2025)
PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
by: Gong, Xueluan, et al.
Published: (2024)
by: Gong, Xueluan, et al.
Published: (2024)
Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs
by: Yoon, Sangyeon, et al.
Published: (2026)
by: Yoon, Sangyeon, et al.
Published: (2026)
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
by: Guo, Weiyang, et al.
Published: (2026)
by: Guo, Weiyang, et al.
Published: (2026)
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs
by: Pu, Rui, et al.
Published: (2024)
by: Pu, Rui, et al.
Published: (2024)
Data to Defense: The Role of Curation in Customizing LLMs Against Jailbreaking Attacks
by: Liu, Xiaoqun, et al.
Published: (2024)
by: Liu, Xiaoqun, et al.
Published: (2024)
"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios
by: Sun, Zhen, et al.
Published: (2025)
by: Sun, Zhen, et al.
Published: (2025)
DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs
by: Xu, Wenzhuo, et al.
Published: (2026)
by: Xu, Wenzhuo, et al.
Published: (2026)
Emoji-Based Jailbreaking of Large Language Models
by: Gopinadh, M P V S, et al.
Published: (2026)
by: Gopinadh, M P V S, et al.
Published: (2026)
PPMI: Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases
by: Bae, Yubeen, et al.
Published: (2025)
by: Bae, Yubeen, et al.
Published: (2025)
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations
by: Li, Xiaohu, et al.
Published: (2025)
by: Li, Xiaohu, et al.
Published: (2025)
Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs
by: Yan, Yu, et al.
Published: (2025)
by: Yan, Yu, et al.
Published: (2025)
Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks
by: Tong, Haibo, et al.
Published: (2025)
by: Tong, Haibo, et al.
Published: (2025)
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
by: Wang, Xunguang, et al.
Published: (2024)
by: Wang, Xunguang, et al.
Published: (2024)
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
by: Chen, Zhuowei, et al.
Published: (2025)
by: Chen, Zhuowei, et al.
Published: (2025)
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
by: Xu, Zhao, et al.
Published: (2024)
by: Xu, Zhao, et al.
Published: (2024)
Automatic Jailbreaking of the Text-to-Image Generative AI Systems
by: Kim, Minseon, et al.
Published: (2024)
by: Kim, Minseon, et al.
Published: (2024)
One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs
by: Tan, Yixin, et al.
Published: (2025)
by: Tan, Yixin, et al.
Published: (2025)
LLMs Can Defend Themselves Against Jailbreaking in a Practical Manner: A Vision Paper
by: Wu, Daoyuan, et al.
Published: (2024)
by: Wu, Daoyuan, et al.
Published: (2024)
Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs
by: Ling, Zijian, et al.
Published: (2026)
by: Ling, Zijian, et al.
Published: (2026)
Untargeted Jailbreak Attack
by: Huang, Xinzhe, et al.
Published: (2025)
by: Huang, Xinzhe, et al.
Published: (2025)
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
by: Liu, Fan, et al.
Published: (2024)
by: Liu, Fan, et al.
Published: (2024)
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
by: Shang, Shang, et al.
Published: (2024)
by: Shang, Shang, et al.
Published: (2024)
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)
by: Chu, Junjie, et al.
Published: (2024)
Evaluating Jailbreaking Vulnerabilities in LLMs Deployed as Assistants for Smart Grid Operations: A Benchmark Against NERC Standards
by: Hammadia, Taha, et al.
Published: (2026)
by: Hammadia, Taha, et al.
Published: (2026)
JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs
by: Li, Hongyi, et al.
Published: (2024)
by: Li, Hongyi, et al.
Published: (2024)
Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning
by: Wang, Zhaoqi, et al.
Published: (2026)
by: Wang, Zhaoqi, et al.
Published: (2026)
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
by: Wang, Peiran, et al.
Published: (2024)
by: Wang, Peiran, et al.
Published: (2024)
Jailbreaking LLMs via Calibration
by: Lu, Yuxuan, et al.
Published: (2026)
by: Lu, Yuxuan, et al.
Published: (2026)
Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
by: Park, Junyoung, et al.
Published: (2026)
by: Park, Junyoung, et al.
Published: (2026)
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)
bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
by: Ji, Wence, et al.
Published: (2025)
by: Ji, Wence, et al.
Published: (2025)
SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters
by: Yang, Yan, et al.
Published: (2024)
by: Yang, Yan, et al.
Published: (2024)
Similar Items
-
Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity
by: Husain, Bilal Saleh
Published: (2025) -
Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion
by: Cui, Tiehan, et al.
Published: (2025) -
FlipAttack: Jailbreak LLMs via Flipping
by: Liu, Yue, et al.
Published: (2024) -
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
by: Zhang, Zheng, et al.
Published: (2025) -
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
by: Li, Haoyang, et al.
Published: (2025)