Saved in:
| Main Authors: | Yu, Xucheng, Jin, Haibo, Zeng, Huimin, Wang, Haohan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.21948 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
by: Zhou, Andy, et al.
Published: (2024)
by: Zhou, Andy, et al.
Published: (2024)
Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization
by: Guan, Jiwei, et al.
Published: (2026)
by: Guan, Jiwei, et al.
Published: (2026)
MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation
by: Jiang, Weisen, et al.
Published: (2025)
by: Jiang, Weisen, et al.
Published: (2025)
FedDefender: Backdoor Attack Defense in Federated Learning
by: Gill, Waris, et al.
Published: (2023)
by: Gill, Waris, et al.
Published: (2023)
TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization
by: Fu, Lucheng, et al.
Published: (2026)
by: Fu, Lucheng, et al.
Published: (2026)
Controlling Output Rankings in Generative Engines for LLM-based Search
by: Jin, Haibo, et al.
Published: (2026)
by: Jin, Haibo, et al.
Published: (2026)
Defending the Edge: Representative-Attention Defense against Backdoor Attacks in Federated Learning
by: Obioma, Chibueze Peace, et al.
Published: (2025)
by: Obioma, Chibueze Peace, et al.
Published: (2025)
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
by: Jin, Haibo, et al.
Published: (2024)
by: Jin, Haibo, et al.
Published: (2024)
Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers
by: Du, Yixuan, et al.
Published: (2026)
by: Du, Yixuan, et al.
Published: (2026)
Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks
by: Feng, Yuyuan, et al.
Published: (2025)
by: Feng, Yuyuan, et al.
Published: (2025)
Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain
by: Cheng, Gang, et al.
Published: (2025)
by: Cheng, Gang, et al.
Published: (2025)
GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing
by: Zhang, Peiyan, et al.
Published: (2025)
by: Zhang, Peiyan, et al.
Published: (2025)
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
by: Zeng, Yifan, et al.
Published: (2024)
by: Zeng, Yifan, et al.
Published: (2024)
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
by: Jin, Haibo, et al.
Published: (2024)
by: Jin, Haibo, et al.
Published: (2024)
Defending against Backdoor Attack on Deep Neural Networks
by: Cheng, Hao, et al.
Published: (2020)
by: Cheng, Hao, et al.
Published: (2020)
SCI-Verifier: Scientific Verifier with Thinking
by: Zheng, Shenghe, et al.
Published: (2025)
by: Zheng, Shenghe, et al.
Published: (2025)
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models
by: Yu, Ye, et al.
Published: (2025)
by: Yu, Ye, et al.
Published: (2025)
Bridging Models to Defend: A Population-Based Strategy for Robust Adversarial Defense
by: Wang, Ren, et al.
Published: (2023)
by: Wang, Ren, et al.
Published: (2023)
Invariant Aggregator for Defending against Federated Backdoor Attacks
by: Wang, Xiaoyang, et al.
Published: (2022)
by: Wang, Xiaoyang, et al.
Published: (2022)
Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation
by: Edemacu, Kennedy, et al.
Published: (2025)
by: Edemacu, Kennedy, et al.
Published: (2025)
REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
by: Zhang, Peiyan, et al.
Published: (2024)
by: Zhang, Peiyan, et al.
Published: (2024)
PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks
by: Guo, Ping, et al.
Published: (2024)
by: Guo, Ping, et al.
Published: (2024)
GEO: Generative Engine Optimization
by: Aggarwal, Pranjal, et al.
Published: (2023)
by: Aggarwal, Pranjal, et al.
Published: (2023)
CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification
by: Mu, Fangwen, et al.
Published: (2024)
by: Mu, Fangwen, et al.
Published: (2024)
Filter, Obstruct and Dilute: Defending Against Backdoor Attacks on Semi-Supervised Learning
by: Wang, Xinrui, et al.
Published: (2025)
by: Wang, Xinrui, et al.
Published: (2025)
Defending Deep Regression Models against Backdoor Attacks
by: Du, Lingyu, et al.
Published: (2024)
by: Du, Lingyu, et al.
Published: (2024)
FL-Defender: Combating Targeted Attacks in Federated Learning
by: Jebreel, Najeeb, et al.
Published: (2022)
by: Jebreel, Najeeb, et al.
Published: (2022)
Your Agent Can Defend Itself against Backdoor Attacks
by: Changjiang, Li, et al.
Published: (2025)
by: Changjiang, Li, et al.
Published: (2025)
SCI: A Metacognitive Control for Signal Dynamics
by: Meesala, Vishal Joshua
Published: (2025)
by: Meesala, Vishal Joshua
Published: (2025)
Revisiting Gradient Pruning: A Dual Realization for Defending against Gradient Attacks
by: Xue, Lulu, et al.
Published: (2024)
by: Xue, Lulu, et al.
Published: (2024)
Noise Masking Attacks and Defenses for Pretrained Speech Models
by: Jagielski, Matthew, et al.
Published: (2024)
by: Jagielski, Matthew, et al.
Published: (2024)
Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning
by: Wang, Yujing, et al.
Published: (2024)
by: Wang, Yujing, et al.
Published: (2024)
Defending Against Indirect Prompt Injection Attacks With Spotlighting
by: Hines, Keegan, et al.
Published: (2024)
by: Hines, Keegan, et al.
Published: (2024)
HashVFL: Defending Against Data Reconstruction Attacks in Vertical Federated Learning
by: Qiu, Pengyu, et al.
Published: (2022)
by: Qiu, Pengyu, et al.
Published: (2022)
Defending Against Poisoning Attacks in Federated Learning with Blockchain
by: Dong, Nanqing, et al.
Published: (2023)
by: Dong, Nanqing, et al.
Published: (2023)
Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend
by: Liu, Yige, et al.
Published: (2026)
by: Liu, Yige, et al.
Published: (2026)
Defending Membership Inference Attacks via Privacy-aware Sparsity Tuning
by: Hu, Qiang, et al.
Published: (2024)
by: Hu, Qiang, et al.
Published: (2024)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
by: Robey, Alexander, et al.
Published: (2023)
by: Robey, Alexander, et al.
Published: (2023)
MEA-Defender: A Robust Watermark against Model Extraction Attack
by: Lv, Peizhuo, et al.
Published: (2024)
by: Lv, Peizhuo, et al.
Published: (2024)
Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models
by: Kulkarni, Prashant, et al.
Published: (2025)
by: Kulkarni, Prashant, et al.
Published: (2025)
Similar Items
-
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
by: Zhou, Andy, et al.
Published: (2024) -
Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization
by: Guan, Jiwei, et al.
Published: (2026) -
MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation
by: Jiang, Weisen, et al.
Published: (2025) -
FedDefender: Backdoor Attack Defense in Federated Learning
by: Gill, Waris, et al.
Published: (2023) -
TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization
by: Fu, Lucheng, et al.
Published: (2026)