Saved in:
| Main Authors: | Hong, Wenjing, Rong, Zhonghua, Wang, Li, Chang, Feng, Zhu, Jian, Tang, Ke, Zhu, Zexuan, Ong, Yew-Soon |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.20122 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evolving Skill-Structured Attack Memory Enhances LLM Jailbreaking
by: Zhang, Junke, et al.
Published: (2026)
by: Zhang, Junke, et al.
Published: (2026)
Untargeted Jailbreak Attack
by: Huang, Xinzhe, et al.
Published: (2025)
by: Huang, Xinzhe, et al.
Published: (2025)
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
by: Feng, Yingchaojie, et al.
Published: (2024)
by: Feng, Yingchaojie, et al.
Published: (2024)
Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses
by: Shang, Zhengchun, et al.
Published: (2025)
by: Shang, Zhengchun, et al.
Published: (2025)
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
by: Chen, Yunhao, et al.
Published: (2025)
by: Chen, Yunhao, et al.
Published: (2025)
SoK: Robustness in Large Language Models against Jailbreak Attacks
by: Xu, Feiyue, et al.
Published: (2026)
by: Xu, Feiyue, et al.
Published: (2026)
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
by: Lu, Ning, et al.
Published: (2025)
by: Lu, Ning, et al.
Published: (2025)
StructuralSleight: Automated Jailbreak Attacks on Large Language Models Utilizing Uncommon Text-Organization Structures
by: Li, Bangxin, et al.
Published: (2024)
by: Li, Bangxin, et al.
Published: (2024)
VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection
by: Mukhtar, Samal, et al.
Published: (2026)
by: Mukhtar, Samal, et al.
Published: (2026)
Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models
by: Das, Badhan Chandra, et al.
Published: (2026)
by: Das, Badhan Chandra, et al.
Published: (2026)
Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic Narratives
by: Chang, Wenhan, et al.
Published: (2025)
by: Chang, Wenhan, et al.
Published: (2025)
SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking
by: Li, Jindong, et al.
Published: (2026)
by: Li, Jindong, et al.
Published: (2026)
Towards Robust Multimodal Large Language Models Against Jailbreak Attacks
by: Yin, Ziyi, et al.
Published: (2025)
by: Yin, Ziyi, et al.
Published: (2025)
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
by: Piet, Julien, et al.
Published: (2025)
by: Piet, Julien, et al.
Published: (2025)
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense
by: Hao, Shuyang, et al.
Published: (2025)
by: Hao, Shuyang, et al.
Published: (2025)
Mitigating Many-shot Jailbreak Attacks with One Single Demonstration
by: Chen, Kejia, et al.
Published: (2026)
by: Chen, Kejia, et al.
Published: (2026)
TRACE: Task-Aware Adaptive Self-Evolving Agentic Jailbreaking
by: Zeng, Churui, et al.
Published: (2026)
by: Zeng, Churui, et al.
Published: (2026)
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
by: Zhu, Sicheng, et al.
Published: (2024)
by: Zhu, Sicheng, et al.
Published: (2024)
MultiKG: Multi-Source Threat Intelligence Aggregation for High-Quality Knowledge Graph Representation of Attack Techniques
by: Wang, Jian, et al.
Published: (2024)
by: Wang, Jian, et al.
Published: (2024)
PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs
by: Wang, Xinkai, et al.
Published: (2025)
by: Wang, Xinkai, et al.
Published: (2025)
AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens
by: Lu, Lin, et al.
Published: (2024)
by: Lu, Lin, et al.
Published: (2024)
JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework
by: Zhou, Yuxuan, et al.
Published: (2025)
by: Zhou, Yuxuan, et al.
Published: (2025)
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
by: Mao, Yanxu, et al.
Published: (2025)
by: Mao, Yanxu, et al.
Published: (2025)
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
by: Wang, Youze, et al.
Published: (2025)
by: Wang, Youze, et al.
Published: (2025)
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
by: Deng, Gelei, et al.
Published: (2023)
by: Deng, Gelei, et al.
Published: (2023)
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
by: Reddy, Aashray, et al.
Published: (2025)
by: Reddy, Aashray, et al.
Published: (2025)
When LLMs Team Up: A Coordinated Attack Framework for Automated Cyber Intrusions
by: Qi, Minfeng, et al.
Published: (2026)
by: Qi, Minfeng, et al.
Published: (2026)
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
by: Wang, Xiangwen, et al.
Published: (2026)
by: Wang, Xiangwen, et al.
Published: (2026)
HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models
by: Narula, Sidhant, et al.
Published: (2025)
by: Narula, Sidhant, et al.
Published: (2025)
Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
by: Park, Junyoung, et al.
Published: (2026)
by: Park, Junyoung, et al.
Published: (2026)
ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs
by: Liu, Xu, et al.
Published: (2025)
by: Liu, Xu, et al.
Published: (2025)
Multi-Objective Reinforcement Learning for Automated Resilient Cyber Defence
by: O'Driscoll, Ross, et al.
Published: (2024)
by: O'Driscoll, Ross, et al.
Published: (2024)
Large Language Model Adversarial Landscape Through the Lens of Attack Objectives
by: Wang, Nan, et al.
Published: (2025)
by: Wang, Nan, et al.
Published: (2025)
Jailbreaking Attack against Multimodal Large Language Model
by: Niu, Zhenxing, et al.
Published: (2024)
by: Niu, Zhenxing, et al.
Published: (2024)
Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses
by: Zhong, Xingwei, et al.
Published: (2025)
by: Zhong, Xingwei, et al.
Published: (2025)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning
by: Li, Ziang, et al.
Published: (2025)
by: Li, Ziang, et al.
Published: (2025)
Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents
by: Ding, Renhua, et al.
Published: (2025)
by: Ding, Renhua, et al.
Published: (2025)
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
by: Zhang, Wenhui, et al.
Published: (2025)
by: Zhang, Wenhui, et al.
Published: (2025)
Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models
by: Kadali, Sri Durga Sai Sowmya, et al.
Published: (2026)
by: Kadali, Sri Durga Sai Sowmya, et al.
Published: (2026)
Similar Items
-
Evolving Skill-Structured Attack Memory Enhances LLM Jailbreaking
by: Zhang, Junke, et al.
Published: (2026) -
Untargeted Jailbreak Attack
by: Huang, Xinzhe, et al.
Published: (2025) -
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
by: Feng, Yingchaojie, et al.
Published: (2024) -
Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses
by: Shang, Zhengchun, et al.
Published: (2025) -
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
by: Chen, Yunhao, et al.
Published: (2025)