Saved in:
| Main Authors: | Ye, Xi, Liu, Yiwen, Wang, Lina, Wang, Run, Yang, Geying, Hou, Yufei, Yu, Jiayi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.07141 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Metaphor-based Jailbreak Attacks on Text-to-Image Models
by: Zhang, Chenyu, et al.
Published: (2025)
by: Zhang, Chenyu, et al.
Published: (2025)
Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
by: Zhang, Chenyu, et al.
Published: (2025)
by: Zhang, Chenyu, et al.
Published: (2025)
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
by: Li, Linbao, et al.
Published: (2025)
by: Li, Linbao, et al.
Published: (2025)
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
by: Ma, Jiachen, et al.
Published: (2024)
by: Ma, Jiachen, et al.
Published: (2024)
Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models
by: Zhao, Shiqian, et al.
Published: (2025)
by: Zhao, Shiqian, et al.
Published: (2025)
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
by: Xiong, Chen, et al.
Published: (2024)
by: Xiong, Chen, et al.
Published: (2024)
AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models
by: Wang, Yiming, et al.
Published: (2024)
by: Wang, Yiming, et al.
Published: (2024)
Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards
by: Yan, Song, et al.
Published: (2025)
by: Yan, Song, et al.
Published: (2025)
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
by: Li, Qizhang, et al.
Published: (2024)
by: Li, Qizhang, et al.
Published: (2024)
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
Defending Jailbreak Prompts via In-Context Adversarial Game
by: Zhou, Yujun, et al.
Published: (2024)
by: Zhou, Yujun, et al.
Published: (2024)
Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models
by: Liu, Jiangtao, et al.
Published: (2025)
by: Liu, Jiangtao, et al.
Published: (2025)
EVA: Editing for Versatile Alignment against Jailbreaks
by: Wang, Yi, et al.
Published: (2026)
by: Wang, Yi, et al.
Published: (2026)
Proactive defense against LLM Jailbreak
by: Zhao, Weiliang, et al.
Published: (2025)
by: Zhao, Weiliang, et al.
Published: (2025)
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
by: Wu, Yixin, et al.
Published: (2023)
by: Wu, Yixin, et al.
Published: (2023)
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
by: Hu, Xiaomeng, et al.
Published: (2024)
by: Hu, Xiaomeng, et al.
Published: (2024)
Model-Editing-Based Jailbreak against Safety-aligned Large Language Models
by: Li, Yuxi, et al.
Published: (2024)
by: Li, Yuxi, et al.
Published: (2024)
Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models
by: Wang, Yanyun, et al.
Published: (2026)
by: Wang, Yanyun, et al.
Published: (2026)
HTS-Attack: Heuristic Token Search for Jailbreaking Text-to-Image Models
by: Gao, Sensen, et al.
Published: (2024)
by: Gao, Sensen, et al.
Published: (2024)
ShallowJail: Steering Jailbreaks against Large Language Models
by: Liu, Shang, et al.
Published: (2026)
by: Liu, Shang, et al.
Published: (2026)
Combinational Backdoor Attack against Customized Text-to-Image Models
by: Jiang, Wenbo, et al.
Published: (2024)
by: Jiang, Wenbo, et al.
Published: (2024)
GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance
by: Zhang, Zaixi, et al.
Published: (2025)
by: Zhang, Zaixi, et al.
Published: (2025)
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
by: Wang, Xunguang, et al.
Published: (2024)
by: Wang, Xunguang, et al.
Published: (2024)
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
by: Wang, Youze, et al.
Published: (2025)
by: Wang, Youze, et al.
Published: (2025)
SoK: Robustness in Large Language Models against Jailbreak Attacks
by: Xu, Feiyue, et al.
Published: (2026)
by: Xu, Feiyue, et al.
Published: (2026)
Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)
by: Gao, Kuofeng, et al.
Published: (2025)
ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks
by: Zhuang, Zhixiong, et al.
Published: (2025)
by: Zhuang, Zhixiong, et al.
Published: (2025)
Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs
by: Liu, Yize, et al.
Published: (2025)
by: Liu, Yize, et al.
Published: (2025)
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
by: Yu, Zhiyuan, et al.
Published: (2024)
by: Yu, Zhiyuan, et al.
Published: (2024)
PLA: Prompt Learning Attack against Text-to-Image Generative Models
by: Lyu, Xinqi, et al.
Published: (2025)
by: Lyu, Xinqi, et al.
Published: (2025)
Towards Action Hijacking of Large Language Model-based Agent
by: Zhang, Yuyang, et al.
Published: (2024)
by: Zhang, Yuyang, et al.
Published: (2024)
OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing
by: Chen, Jianming, et al.
Published: (2026)
by: Chen, Jianming, et al.
Published: (2026)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
by: Chen, Yunhao, et al.
Published: (2025)
by: Chen, Yunhao, et al.
Published: (2025)
Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning
by: Wang, Zhaoqi, et al.
Published: (2025)
by: Wang, Zhaoqi, et al.
Published: (2025)
Automatic Jailbreaking of the Text-to-Image Generative AI Systems
by: Kim, Minseon, et al.
Published: (2024)
by: Kim, Minseon, et al.
Published: (2024)
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
by: Deng, Gelei, et al.
Published: (2023)
by: Deng, Gelei, et al.
Published: (2023)
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
by: Chen, Yulin, et al.
Published: (2025)
by: Chen, Yulin, et al.
Published: (2025)
Transfer Learning of Real Image Features with Soft Contrastive Loss for Fake Image Detection
by: Liang, Ziyou, et al.
Published: (2024)
by: Liang, Ziyou, et al.
Published: (2024)
$PC^2$: Politically Controversial Content Generation via Jailbreaking Attacks on GPT-based Text-to-Image Models
by: Choi, Wonwoo, et al.
Published: (2026)
by: Choi, Wonwoo, et al.
Published: (2026)
Similar Items
-
Metaphor-based Jailbreak Attacks on Text-to-Image Models
by: Zhang, Chenyu, et al.
Published: (2025) -
Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
by: Zhang, Chenyu, et al.
Published: (2025) -
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
by: Li, Linbao, et al.
Published: (2025) -
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
by: Ma, Jiachen, et al.
Published: (2024) -
Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models
by: Zhao, Shiqian, et al.
Published: (2025)