Saved in:
| Main Authors: | Li, Xiangman, Wu, Xiaodong, Li, Qi, Ni, Jianbing, Lu, Rongxing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.15182 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts
by: Wu, Xiaodong, et al.
Published: (2025)
by: Wu, Xiaodong, et al.
Published: (2025)
SoK: A Comprehensive Security Analysis of Jailbreak Resilience in GPT and DeepSeek Models
by: Wu, Xiaodong, et al.
Published: (2025)
by: Wu, Xiaodong, et al.
Published: (2025)
Robustness of Watermarking on Text-to-Image Diffusion Models
by: Wu, Xiaodong, et al.
Published: (2024)
by: Wu, Xiaodong, et al.
Published: (2024)
PDLRecover: Privacy-preserving Decentralized Model Recovery with Machine Unlearning
by: Li, Xiangman, et al.
Published: (2025)
by: Li, Xiangman, et al.
Published: (2025)
Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image Editing
by: Wu, Xiaodong, et al.
Published: (2026)
by: Wu, Xiaodong, et al.
Published: (2026)
When There Is No Decoder: Removing Watermarks from Stable Diffusion Models in a No-box Setting
by: Wu, Xiaodong, et al.
Published: (2025)
by: Wu, Xiaodong, et al.
Published: (2025)
Jailbreaking Attack against Multimodal Large Language Model
by: Niu, Zhenxing, et al.
Published: (2024)
by: Niu, Zhenxing, et al.
Published: (2024)
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
by: Chen, Guangke, et al.
Published: (2025)
by: Chen, Guangke, et al.
Published: (2025)
Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models
by: Li, Xiao, et al.
Published: (2024)
by: Li, Xiao, et al.
Published: (2024)
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
by: Huang, Tiansheng, et al.
Published: (2024)
by: Huang, Tiansheng, et al.
Published: (2024)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
by: Robey, Alexander, et al.
Published: (2023)
by: Robey, Alexander, et al.
Published: (2023)
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
by: Zeng, Yifan, et al.
Published: (2024)
by: Zeng, Yifan, et al.
Published: (2024)
SPIRIT: Patching Speech Language Models against Jailbreak Attacks
by: Djanibekov, Amirbek, et al.
Published: (2025)
by: Djanibekov, Amirbek, et al.
Published: (2025)
Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack
by: Huang, Tiansheng, et al.
Published: (2024)
by: Huang, Tiansheng, et al.
Published: (2024)
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
by: Liu, Guozhi, et al.
Published: (2025)
by: Liu, Guozhi, et al.
Published: (2025)
SUA: Stealthy Multimodal Large Language Model Unlearning Attack
by: Zhang, Xianren, et al.
Published: (2025)
by: Zhang, Xianren, et al.
Published: (2025)
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
by: Yi, Sibo, et al.
Published: (2024)
by: Yi, Sibo, et al.
Published: (2024)
Model-Editing-Based Jailbreak against Safety-aligned Large Language Models
by: Li, Yuxi, et al.
Published: (2024)
by: Li, Yuxi, et al.
Published: (2024)
Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation
by: Liu, Guozhi, et al.
Published: (2024)
by: Liu, Guozhi, et al.
Published: (2024)
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
by: Wang, Xiangwen, et al.
Published: (2026)
by: Wang, Xiangwen, et al.
Published: (2026)
From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks
by: Zhang, Zhexin, et al.
Published: (2024)
by: Zhang, Zhexin, et al.
Published: (2024)
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
by: Chen, Yiwei, et al.
Published: (2025)
by: Chen, Yiwei, et al.
Published: (2025)
Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?
by: Li, Zexi, et al.
Published: (2025)
by: Li, Zexi, et al.
Published: (2025)
On Large Language Model Continual Unlearning
by: Gao, Chongyang, et al.
Published: (2024)
by: Gao, Chongyang, et al.
Published: (2024)
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
by: Hu, Zixuan, et al.
Published: (2025)
by: Hu, Zixuan, et al.
Published: (2025)
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
by: Cui, Kaiyuan, et al.
Published: (2026)
by: Cui, Kaiyuan, et al.
Published: (2026)
The Dark Side of Trust: Authority Citation-Driven Jailbreak Attacks on Large Language Models
by: Yang, Xikang, et al.
Published: (2024)
by: Yang, Xikang, et al.
Published: (2024)
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
by: Chen, Taiye, et al.
Published: (2025)
by: Chen, Taiye, et al.
Published: (2025)
TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice
by: Goel, Aman, et al.
Published: (2025)
by: Goel, Aman, et al.
Published: (2025)
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
by: Lin, Shi, et al.
Published: (2024)
by: Lin, Shi, et al.
Published: (2024)
Semantic Membership Inference Attack against Large Language Models
by: Mozaffari, Hamid, et al.
Published: (2024)
by: Mozaffari, Hamid, et al.
Published: (2024)
Protecting the Neural Networks against FGSM Attack Using Machine Unlearning
by: Khorasani, Amir Hossein, et al.
Published: (2025)
by: Khorasani, Amir Hossein, et al.
Published: (2025)
Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
by: Wei, Rongzhe, et al.
Published: (2024)
by: Wei, Rongzhe, et al.
Published: (2024)
Membership Inference Attacks against Large Vision-Language Models
by: Li, Zhan, et al.
Published: (2024)
by: Li, Zhan, et al.
Published: (2024)
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
by: Huang, Tiansheng, et al.
Published: (2024)
by: Huang, Tiansheng, et al.
Published: (2024)
DeepInception: Hypnotize Large Language Model to Be Jailbreaker
by: Li, Xuan, et al.
Published: (2023)
by: Li, Xuan, et al.
Published: (2023)
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training
by: Tran, Toan, et al.
Published: (2025)
by: Tran, Toan, et al.
Published: (2025)
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
by: Nikolić, Kristina, et al.
Published: (2025)
by: Nikolić, Kristina, et al.
Published: (2025)
Per-parameter Task Arithmetic for Unlearning in Large Language Models
by: Cai, Chengyi, et al.
Published: (2026)
by: Cai, Chengyi, et al.
Published: (2026)
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
by: Wei, Zhipeng, et al.
Published: (2024)
by: Wei, Zhipeng, et al.
Published: (2024)
Similar Items
-
SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts
by: Wu, Xiaodong, et al.
Published: (2025) -
SoK: A Comprehensive Security Analysis of Jailbreak Resilience in GPT and DeepSeek Models
by: Wu, Xiaodong, et al.
Published: (2025) -
Robustness of Watermarking on Text-to-Image Diffusion Models
by: Wu, Xiaodong, et al.
Published: (2024) -
PDLRecover: Privacy-preserving Decentralized Model Recovery with Machine Unlearning
by: Li, Xiangman, et al.
Published: (2025) -
Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image Editing
by: Wu, Xiaodong, et al.
Published: (2026)