Saved in:
| Main Authors: | Su, Guangzhi, Huang, Shuchang, Ke, Yutong, Liu, Zhuohang, Qian, Long, Huang, Kaizhu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.26830 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Defending against Jailbreak through Early Exit Generation of Large Language Models
by: Zhao, Chongwen, et al.
Published: (2024)
by: Zhao, Chongwen, et al.
Published: (2024)
KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing
by: Xu, Zhenhua, et al.
Published: (2026)
by: Xu, Zhenhua, et al.
Published: (2026)
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails
by: Mangaokar, Neal, et al.
Published: (2024)
by: Mangaokar, Neal, et al.
Published: (2024)
Recent Advances in Attack and Defense Approaches of Large Language Models
by: Cui, Jing, et al.
Published: (2024)
by: Cui, Jing, et al.
Published: (2024)
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
by: Zhang, Shenyi, et al.
Published: (2025)
by: Zhang, Shenyi, et al.
Published: (2025)
PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification
by: Gong, Guangyu, et al.
Published: (2026)
by: Gong, Guangyu, et al.
Published: (2026)
A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking
by: Huang, Chaoyue, et al.
Published: (2025)
by: Huang, Chaoyue, et al.
Published: (2025)
Enhanced Privacy Leakage from Noise-Perturbed Gradients via Gradient-Guided Conditional Diffusion Models
by: Meng, Jiayang, et al.
Published: (2025)
by: Meng, Jiayang, et al.
Published: (2025)
AGNNCert: Defending Graph Neural Networks against Arbitrary Perturbations with Deterministic Certification
by: Li, Jiate, et al.
Published: (2025)
by: Li, Jiate, et al.
Published: (2025)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models
by: Ying, Zonghao, et al.
Published: (2024)
by: Ying, Zonghao, et al.
Published: (2024)
Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?
by: Lowy, Andrew, et al.
Published: (2024)
by: Lowy, Andrew, et al.
Published: (2024)
Invariant Aggregator for Defending against Federated Backdoor Attacks
by: Wang, Xiaoyang, et al.
Published: (2022)
by: Wang, Xiaoyang, et al.
Published: (2022)
TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models
by: Guo, Zhen, et al.
Published: (2026)
by: Guo, Zhen, et al.
Published: (2026)
TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
by: Tang, Duoxun, et al.
Published: (2025)
by: Tang, Duoxun, et al.
Published: (2025)
SmartGuard: Leveraging Large Language Models for Network Attack Detection through Audit Log Analysis and Summarization
by: Zhang, Hao, et al.
Published: (2025)
by: Zhang, Hao, et al.
Published: (2025)
DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation
by: Huang, Li, et al.
Published: (2026)
by: Huang, Li, et al.
Published: (2026)
Large Language Models are Autonomous Cyber Defenders
by: Castro, Sebastián R., et al.
Published: (2025)
by: Castro, Sebastián R., et al.
Published: (2025)
Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in Concept Bottleneck Models
by: Lai, Songning, et al.
Published: (2024)
by: Lai, Songning, et al.
Published: (2024)
Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations
by: Wong, Ryan, et al.
Published: (2025)
by: Wong, Ryan, et al.
Published: (2025)
Defending Large Language Models Against Attacks With Residual Stream Activation Analysis
by: Kawasaki, Amelia, et al.
Published: (2024)
by: Kawasaki, Amelia, et al.
Published: (2024)
Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence
by: Chen, Ruoxi, et al.
Published: (2021)
by: Chen, Ruoxi, et al.
Published: (2021)
Smooth Sensitivity for Geo-Privacy
by: Liang, Yuting, et al.
Published: (2024)
by: Liang, Yuting, et al.
Published: (2024)
How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation
by: Long, Zhuohang, et al.
Published: (2025)
by: Long, Zhuohang, et al.
Published: (2025)
BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks
by: Lee, Hanyong, et al.
Published: (2025)
by: Lee, Hanyong, et al.
Published: (2025)
ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations
by: Chang, Jung-Woo, et al.
Published: (2024)
by: Chang, Jung-Woo, et al.
Published: (2024)
Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning
by: Wang, Yujing, et al.
Published: (2024)
by: Wang, Yujing, et al.
Published: (2024)
EnCAgg: Enhanced Clustering Aggregation for Robust Federated Learning against Dynamic Model Poisoning
by: Zhang, Tianyun, et al.
Published: (2026)
by: Zhang, Tianyun, et al.
Published: (2026)
Privacy Loss of Noise Perturbation via Concentration Analysis of A Product Measure
by: Liu, Shuainan, et al.
Published: (2025)
by: Liu, Shuainan, et al.
Published: (2025)
Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models
by: Peng, Wanli, et al.
Published: (2025)
by: Peng, Wanli, et al.
Published: (2025)
WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents
by: Chen, Yulin, et al.
Published: (2026)
by: Chen, Yulin, et al.
Published: (2026)
Secure Distributed Learning for CAVs: Defending Against Gradient Leakage with Leveled Homomorphic Encryption
by: Najjar, Muhammad Ali, et al.
Published: (2025)
by: Najjar, Muhammad Ali, et al.
Published: (2025)
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
by: Teng, Ma, et al.
Published: (2024)
by: Teng, Ma, et al.
Published: (2024)
Towards Robust Multimodal Large Language Models Against Jailbreak Attacks
by: Yin, Ziyi, et al.
Published: (2025)
by: Yin, Ziyi, et al.
Published: (2025)
On Calibration of LLM-based Guard Models for Reliable Content Moderation
by: Liu, Hongfu, et al.
Published: (2024)
by: Liu, Hongfu, et al.
Published: (2024)
Defend LLMs Through Self-Consciousness
by: Huang, Boshi, et al.
Published: (2025)
by: Huang, Boshi, et al.
Published: (2025)
ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning
by: Guan, Shaowei, et al.
Published: (2025)
by: Guan, Shaowei, et al.
Published: (2025)
ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
by: Zhao, Yunhan, et al.
Published: (2026)
by: Zhao, Yunhan, et al.
Published: (2026)
Cross-Modal Backdoors in Multimodal Large Language Models
by: Wang, Runhe, et al.
Published: (2026)
by: Wang, Runhe, et al.
Published: (2026)
JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks
by: Zhang, Xiaoyu, et al.
Published: (2023)
by: Zhang, Xiaoyu, et al.
Published: (2023)
Similar Items
-
Defending against Jailbreak through Early Exit Generation of Large Language Models
by: Zhao, Chongwen, et al.
Published: (2024) -
KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing
by: Xu, Zhenhua, et al.
Published: (2026) -
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails
by: Mangaokar, Neal, et al.
Published: (2024) -
Recent Advances in Attack and Defense Approaches of Large Language Models
by: Cui, Jing, et al.
Published: (2024) -
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
by: Zhang, Shenyi, et al.
Published: (2025)