Saved in:
| Main Authors: | Tian, Yuan, Hu, Bing, Wu, Fang, Li, Xiaomin, Lu, Binghang, Gong, Neil Zhenqiang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.27932 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025)
by: Jiang, Zhengyuan, et al.
Published: (2025)
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
by: Xie, Yueqi, et al.
Published: (2024)
by: Xie, Yueqi, et al.
Published: (2024)
Provably Robust Federated Reinforcement Learning
by: Fang, Minghong, et al.
Published: (2025)
by: Fang, Minghong, et al.
Published: (2025)
Certifiably Robust Image Watermark
by: Jiang, Zhengyuan, et al.
Published: (2024)
by: Jiang, Zhengyuan, et al.
Published: (2024)
A Transfer Attack to Image Watermarks
by: Hu, Yuepeng, et al.
Published: (2024)
by: Hu, Yuepeng, et al.
Published: (2024)
SafeText: Safe Text-to-image Models via Aligning the Text Encoder
by: Hu, Yuepeng, et al.
Published: (2025)
by: Hu, Yuepeng, et al.
Published: (2025)
Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks
by: Xu, Yichang, et al.
Published: (2024)
by: Xu, Yichang, et al.
Published: (2024)
Robustness of Vision Foundation Models to Common Perturbations
by: Liu, Hongbin, et al.
Published: (2026)
by: Liu, Hongbin, et al.
Published: (2026)
Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection
by: Shao, Zedian, et al.
Published: (2026)
by: Shao, Zedian, et al.
Published: (2026)
Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning
by: Jia, Yuqi, et al.
Published: (2024)
by: Jia, Yuqi, et al.
Published: (2024)
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
by: Xie, Yueqi, et al.
Published: (2024)
by: Xie, Yueqi, et al.
Published: (2024)
EditTrack: Detecting and Attributing AI-assisted Image Editing
by: Jiang, Zhengyuan, et al.
Published: (2025)
by: Jiang, Zhengyuan, et al.
Published: (2025)
Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
by: Liu, Hongbin, et al.
Published: (2024)
by: Liu, Hongbin, et al.
Published: (2024)
Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework
by: Yin, Minglei, et al.
Published: (2023)
by: Yin, Minglei, et al.
Published: (2023)
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
by: Zhang, Chiyu, et al.
Published: (2025)
by: Zhang, Chiyu, et al.
Published: (2025)
WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
by: Liu, Yinuo, et al.
Published: (2025)
by: Liu, Yinuo, et al.
Published: (2025)
Refusing Safe Prompts for Multi-modal Large Language Models
by: Shao, Zedian, et al.
Published: (2024)
by: Shao, Zedian, et al.
Published: (2024)
Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
by: Shao, Zedian, et al.
Published: (2024)
by: Shao, Zedian, et al.
Published: (2024)
CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning
by: Zhang, Jinghuai, et al.
Published: (2022)
by: Zhang, Jinghuai, et al.
Published: (2022)
Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
by: Wang, Jiongxiao, et al.
Published: (2024)
by: Wang, Jiongxiao, et al.
Published: (2024)
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
by: Zhu, Kaijie, et al.
Published: (2023)
by: Zhu, Kaijie, et al.
Published: (2023)
Competitive Advantage Attacks to Decentralized Federated Learning
by: Jia, Yuqi, et al.
Published: (2023)
by: Jia, Yuqi, et al.
Published: (2023)
Watermark-based Attribution of AI-Generated Content
by: Jiang, Zhengyuan, et al.
Published: (2024)
by: Jiang, Zhengyuan, et al.
Published: (2024)
Jailbreak Distillation: Renewable Safety Benchmarking
by: Zhang, Jingyu, et al.
Published: (2025)
by: Zhang, Jingyu, et al.
Published: (2025)
VideoMarkBench: Benchmarking Robustness of Video Watermarking
by: Jiang, Zhengyuan, et al.
Published: (2025)
by: Jiang, Zhengyuan, et al.
Published: (2025)
PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
by: Shen, Guobin, et al.
Published: (2025)
by: Shen, Guobin, et al.
Published: (2025)
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
by: Liu, Yupei, et al.
Published: (2023)
by: Liu, Yupei, et al.
Published: (2023)
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
by: Zhang, Chiyu, et al.
Published: (2026)
by: Zhang, Chiyu, et al.
Published: (2026)
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
by: Zhao, Weixiang, et al.
Published: (2025)
by: Zhao, Weixiang, et al.
Published: (2025)
ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data
by: Wang, Reachal, et al.
Published: (2025)
by: Wang, Reachal, et al.
Published: (2025)
Stable Signature is Unstable: Removing Image Watermark from Diffusion Models
by: Hu, Yuepeng, et al.
Published: (2024)
by: Hu, Yuepeng, et al.
Published: (2024)
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
by: Wang, Yidan, et al.
Published: (2025)
by: Wang, Yidan, et al.
Published: (2025)
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
by: Hu, Hanjiang, et al.
Published: (2025)
by: Hu, Hanjiang, et al.
Published: (2025)
When Memory Becomes a Vulnerability: Towards Multi-turn Jailbreak Attacks against Text-to-Image Generation Systems
by: Zhao, Shiqian, et al.
Published: (2025)
by: Zhao, Shiqian, et al.
Published: (2025)
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
by: Huang, Caishuang, et al.
Published: (2024)
by: Huang, Caishuang, et al.
Published: (2024)
Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak
by: Gu, Haoran, et al.
Published: (2026)
by: Gu, Haoran, et al.
Published: (2026)
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
by: You, Wenhao, et al.
Published: (2025)
by: You, Wenhao, et al.
Published: (2025)
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
by: Feng, Yingchaojie, et al.
Published: (2024)
by: Feng, Yingchaojie, et al.
Published: (2024)
What Matters For Safety Alignment?
by: Li, Xing, et al.
Published: (2026)
by: Li, Xing, et al.
Published: (2026)
Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models
by: Liu, Jiangtao, et al.
Published: (2025)
by: Liu, Jiangtao, et al.
Published: (2025)
Similar Items
-
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025) -
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
by: Xie, Yueqi, et al.
Published: (2024) -
Provably Robust Federated Reinforcement Learning
by: Fang, Minghong, et al.
Published: (2025) -
Certifiably Robust Image Watermark
by: Jiang, Zhengyuan, et al.
Published: (2024) -
A Transfer Attack to Image Watermarks
by: Hu, Yuepeng, et al.
Published: (2024)