Saved in:
| Main Authors: | Jeong, Joonhyun, Bae, Seyun, Jung, Yeonsung, Hwang, Jaeryong, Yang, Eunho |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.20823 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
by: Abdali, Sara, et al.
Published: (2024)
by: Abdali, Sara, et al.
Published: (2024)
PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs
by: Wang, Xinkai, et al.
Published: (2025)
by: Wang, Xinkai, et al.
Published: (2025)
DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs
by: Xu, Wenzhuo, et al.
Published: (2026)
by: Xu, Wenzhuo, et al.
Published: (2026)
Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application
by: Xu, Wenzhuo, et al.
Published: (2025)
by: Xu, Wenzhuo, et al.
Published: (2025)
ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs
by: Liu, Xu, et al.
Published: (2025)
by: Liu, Xu, et al.
Published: (2025)
Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense
by: Chen, Zejian, et al.
Published: (2026)
by: Chen, Zejian, et al.
Published: (2026)
LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs
by: Jha, Piyush, et al.
Published: (2024)
by: Jha, Piyush, et al.
Published: (2024)
Mitigating Jailbreaks with Intent-Aware LLMs
by: Yeo, Wei Jie, et al.
Published: (2025)
by: Yeo, Wei Jie, et al.
Published: (2025)
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs
by: Pu, Rui, et al.
Published: (2024)
by: Pu, Rui, et al.
Published: (2024)
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
by: Wang, Yanting, et al.
Published: (2025)
by: Wang, Yanting, et al.
Published: (2025)
Dagger Behind Smile: Fool LLMs with a Happy Ending Story
by: Song, Xurui, et al.
Published: (2025)
by: Song, Xurui, et al.
Published: (2025)
FlipAttack: Jailbreak LLMs via Flipping
by: Liu, Yue, et al.
Published: (2024)
by: Liu, Yue, et al.
Published: (2024)
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
by: Guo, Xuyang, et al.
Published: (2025)
by: Guo, Xuyang, et al.
Published: (2025)
The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning
by: Liu, Mingrui, et al.
Published: (2025)
by: Liu, Mingrui, et al.
Published: (2025)
Activation Surgery: Jailbreaking White-box LLMs without Touching the Prompt
by: Jenny, Maël, et al.
Published: (2026)
by: Jenny, Maël, et al.
Published: (2026)
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)
by: Chu, Junjie, et al.
Published: (2024)
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs
by: Xiang, Shiyu, et al.
Published: (2025)
by: Xiang, Shiyu, et al.
Published: (2025)
You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense
by: Mai, Wuyuao, et al.
Published: (2025)
by: Mai, Wuyuao, et al.
Published: (2025)
Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
by: Xie, Zhixin, et al.
Published: (2025)
by: Xie, Zhixin, et al.
Published: (2025)
PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage
by: Nakka, Krishna Kanth, et al.
Published: (2025)
by: Nakka, Krishna Kanth, et al.
Published: (2025)
PUZZLED: Jailbreaking LLMs through Word-Based Puzzles
by: Ahn, Yelim, et al.
Published: (2025)
by: Ahn, Yelim, et al.
Published: (2025)
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
by: Jaiswal, Piyush, et al.
Published: (2026)
by: Jaiswal, Piyush, et al.
Published: (2026)
Re-Triggering Safeguards within LLMs for Jailbreak Detection
by: Lin, Zheng, et al.
Published: (2026)
by: Lin, Zheng, et al.
Published: (2026)
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos
by: Yao, Yang, et al.
Published: (2025)
by: Yao, Yang, et al.
Published: (2025)
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
by: Qraitem, Maan, et al.
Published: (2024)
by: Qraitem, Maan, et al.
Published: (2024)
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
by: Xiong, Chen, et al.
Published: (2024)
by: Xiong, Chen, et al.
Published: (2024)
RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs
by: Chen, Xuan, et al.
Published: (2024)
by: Chen, Xuan, et al.
Published: (2024)
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
by: Luo, Xuan, et al.
Published: (2025)
by: Luo, Xuan, et al.
Published: (2025)
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
by: Zhang, Chiyu, et al.
Published: (2025)
by: Zhang, Chiyu, et al.
Published: (2025)
Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses
by: Shang, Zhengchun, et al.
Published: (2025)
by: Shang, Zhengchun, et al.
Published: (2025)
Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity
by: Husain, Bilal Saleh
Published: (2025)
by: Husain, Bilal Saleh
Published: (2025)
Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion
by: Cui, Tiehan, et al.
Published: (2025)
by: Cui, Tiehan, et al.
Published: (2025)
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
by: Chen, Yunhao, et al.
Published: (2025)
by: Chen, Yunhao, et al.
Published: (2025)
PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
by: Gong, Xueluan, et al.
Published: (2024)
by: Gong, Xueluan, et al.
Published: (2024)
Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs
by: Yoon, Sangyeon, et al.
Published: (2026)
by: Yoon, Sangyeon, et al.
Published: (2026)
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
by: Guo, Weiyang, et al.
Published: (2026)
by: Guo, Weiyang, et al.
Published: (2026)
TrojanPraise: Jailbreak LLMs via Benign Fine-Tuning
by: Xie, Zhixin, et al.
Published: (2026)
by: Xie, Zhixin, et al.
Published: (2026)
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
by: Liu, Xiaogeng, et al.
Published: (2024)
by: Liu, Xiaogeng, et al.
Published: (2024)
Similar Items
-
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
by: Abdali, Sara, et al.
Published: (2024) -
PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs
by: Wang, Xinkai, et al.
Published: (2025) -
DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs
by: Xu, Wenzhuo, et al.
Published: (2026) -
Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application
by: Xu, Wenzhuo, et al.
Published: (2025) -
ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs
by: Liu, Xu, et al.
Published: (2025)