:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jeong, Joonhyun, Bae, Seyun, Jung, Yeonsung, Hwang, Jaeryong, Yang, Eunho
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2503.20823
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
by: Abdali, Sara, et al.
Published: (2024)

PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs
by: Wang, Xinkai, et al.
Published: (2025)

DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs
by: Xu, Wenzhuo, et al.
Published: (2026)

Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application
by: Xu, Wenzhuo, et al.
Published: (2025)

ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs
by: Liu, Xu, et al.
Published: (2025)

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense
by: Chen, Zejian, et al.
Published: (2026)

LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs
by: Jha, Piyush, et al.
Published: (2024)

Mitigating Jailbreaks with Intent-Aware LLMs
by: Yeo, Wei Jie, et al.
Published: (2025)

Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs
by: Pu, Rui, et al.
Published: (2024)

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
by: Wang, Yanting, et al.
Published: (2025)

Dagger Behind Smile: Fool LLMs with a Happy Ending Story
by: Song, Xurui, et al.
Published: (2025)

FlipAttack: Jailbreak LLMs via Flipping
by: Liu, Yue, et al.
Published: (2024)

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
by: Guo, Xuyang, et al.
Published: (2025)

The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning
by: Liu, Mingrui, et al.
Published: (2025)

Activation Surgery: Jailbreaking White-box LLMs without Touching the Prompt
by: Jenny, Maël, et al.
Published: (2026)

JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)

Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs
by: Xiang, Shiyu, et al.
Published: (2025)

You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense
by: Mai, Wuyuao, et al.
Published: (2025)

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
by: Xie, Zhixin, et al.
Published: (2025)

PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage
by: Nakka, Krishna Kanth, et al.
Published: (2025)

PUZZLED: Jailbreaking LLMs through Word-Based Puzzles
by: Ahn, Yelim, et al.
Published: (2025)

Enhancing Jailbreak Attacks on LLMs via Persona Prompts
by: Zhang, Zheng, et al.
Published: (2025)

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
by: Li, Haoyang, et al.
Published: (2025)

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
by: Jaiswal, Piyush, et al.
Published: (2026)

Re-Triggering Safeguards within LLMs for Jailbreak Detection
by: Lin, Zheng, et al.
Published: (2026)

A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos
by: Yao, Yang, et al.
Published: (2025)

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
by: Qraitem, Maan, et al.
Published: (2024)

Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
by: Xiong, Chen, et al.
Published: (2024)

RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs
by: Chen, Xuan, et al.
Published: (2024)

A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
by: Luo, Xuan, et al.
Published: (2025)

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
by: Zhang, Chiyu, et al.
Published: (2025)

Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses
by: Shang, Zhengchun, et al.
Published: (2025)

Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity
by: Husain, Bilal Saleh
Published: (2025)

Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion
by: Cui, Tiehan, et al.
Published: (2025)

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
by: Chen, Yunhao, et al.
Published: (2025)

PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
by: Gong, Xueluan, et al.
Published: (2024)

Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs
by: Yoon, Sangyeon, et al.
Published: (2026)

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
by: Guo, Weiyang, et al.
Published: (2026)

TrojanPraise: Jailbreak LLMs via Benign Fine-Tuning
by: Xie, Zhixin, et al.
Published: (2026)

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
by: Liu, Xiaogeng, et al.
Published: (2024)