:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tian, Yuan, Hu, Bing, Wu, Fang, Li, Xiaomin, Lu, Binghang, Gong, Neil Zhenqiang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Cryptography and Security Machine Learning
Online Access:	https://arxiv.org/abs/2605.27932
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025)

GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
by: Xie, Yueqi, et al.
Published: (2024)

Provably Robust Federated Reinforcement Learning
by: Fang, Minghong, et al.
Published: (2025)

Certifiably Robust Image Watermark
by: Jiang, Zhengyuan, et al.
Published: (2024)

A Transfer Attack to Image Watermarks
by: Hu, Yuepeng, et al.
Published: (2024)

SafeText: Safe Text-to-image Models via Aligning the Text Encoder
by: Hu, Yuepeng, et al.
Published: (2025)

Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks
by: Xu, Yichang, et al.
Published: (2024)

Robustness of Vision Foundation Models to Common Perturbations
by: Liu, Hongbin, et al.
Published: (2026)

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection
by: Shao, Zedian, et al.
Published: (2026)

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning
by: Jia, Yuqi, et al.
Published: (2024)

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
by: Xie, Yueqi, et al.
Published: (2024)

EditTrack: Detecting and Attributing AI-assisted Image Editing
by: Jiang, Zhengyuan, et al.
Published: (2025)

Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
by: Liu, Hongbin, et al.
Published: (2024)

Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework
by: Yin, Minglei, et al.
Published: (2023)

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
by: Zhang, Chiyu, et al.
Published: (2025)

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
by: Liu, Yinuo, et al.
Published: (2025)

Refusing Safe Prompts for Multi-modal Large Language Models
by: Shao, Zedian, et al.
Published: (2024)

Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
by: Shao, Zedian, et al.
Published: (2024)

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning
by: Zhang, Jinghuai, et al.
Published: (2022)

Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
by: Wang, Jiongxiao, et al.
Published: (2024)

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
by: Zhu, Kaijie, et al.
Published: (2023)

Competitive Advantage Attacks to Decentralized Federated Learning
by: Jia, Yuqi, et al.
Published: (2023)

Watermark-based Attribution of AI-Generated Content
by: Jiang, Zhengyuan, et al.
Published: (2024)

Jailbreak Distillation: Renewable Safety Benchmarking
by: Zhang, Jingyu, et al.
Published: (2025)

VideoMarkBench: Benchmarking Robustness of Video Watermarking
by: Jiang, Zhengyuan, et al.
Published: (2025)

PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
by: Shen, Guobin, et al.
Published: (2025)

Formalizing and Benchmarking Prompt Injection Attacks and Defenses
by: Liu, Yupei, et al.
Published: (2023)

LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
by: Zhang, Chiyu, et al.
Published: (2026)

AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
by: Zhao, Weixiang, et al.
Published: (2025)

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data
by: Wang, Reachal, et al.
Published: (2025)

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models
by: Hu, Yuepeng, et al.
Published: (2024)

PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
by: Wang, Yidan, et al.
Published: (2025)

Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
by: Hu, Hanjiang, et al.
Published: (2025)

When Memory Becomes a Vulnerability: Towards Multi-turn Jailbreak Attacks against Text-to-Image Generation Systems
by: Zhao, Shiqian, et al.
Published: (2025)

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
by: Huang, Caishuang, et al.
Published: (2024)

Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak
by: Gu, Haoran, et al.
Published: (2026)

MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
by: You, Wenhao, et al.
Published: (2025)

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
by: Feng, Yingchaojie, et al.
Published: (2024)

What Matters For Safety Alignment?
by: Li, Xing, et al.
Published: (2026)

Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models
by: Liu, Jiangtao, et al.
Published: (2025)