Saved in:
| Main Authors: | Zhang, Wenhui, Xu, Huiyu, Wang, Zhibo, Li, Zhichao, He, Zeqing, Wei, Xuelin, Ren, Kui |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21380 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
by: Zhang, Wenhui, et al.
Published: (2025)
by: Zhang, Wenhui, et al.
Published: (2025)
Interpretable LLM Guardrails via Sparse Representation Steering
by: He, Zeqing, et al.
Published: (2025)
by: He, Zeqing, et al.
Published: (2025)
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
by: He, Zeqing, et al.
Published: (2024)
by: He, Zeqing, et al.
Published: (2024)
LoopTrap: Termination Poisoning Attacks on LLM Agents
by: Xu, Huiyu, et al.
Published: (2026)
by: Xu, Huiyu, et al.
Published: (2026)
PT-Mark: Invisible Watermarking for Text-to-image Diffusion Models via Semantic-aware Pivotal Tuning
by: Wang, Yaopeng, et al.
Published: (2025)
by: Wang, Yaopeng, et al.
Published: (2025)
Dynamic Dual-level Defense Routing for Continual Adversarial Training
by: Wang, Wenxuan, et al.
Published: (2025)
by: Wang, Wenxuan, et al.
Published: (2025)
Rerouting LLM Routers
by: Shafran, Avital, et al.
Published: (2025)
by: Shafran, Avital, et al.
Published: (2025)
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)
by: Xu, Huiyu, et al.
Published: (2024)
LoRA-Key: User-Centric LoRA Watermarking for Text-to-Image Diffusion Models
by: Wang, Yaopeng, et al.
Published: (2026)
by: Wang, Yaopeng, et al.
Published: (2026)
Safeguarding LLM Embeddings in End-Cloud Collaboration via Entropy-Driven Perturbation
by: Jin, Shuaifan, et al.
Published: (2025)
by: Jin, Shuaifan, et al.
Published: (2025)
CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
The Communication-Friendly Privacy-Preserving Machine Learning against Malicious Adversaries
by: Lu, Tianpei, et al.
Published: (2024)
by: Lu, Tianpei, et al.
Published: (2024)
Reflect-Guard: Enhancing LLM Safeguards against Adversarial Prompts via Logical Self-Reflection
by: Lin, Lixing, et al.
Published: (2026)
by: Lin, Lixing, et al.
Published: (2026)
Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing
by: Langiu, Alessio
Published: (2026)
by: Langiu, Alessio
Published: (2026)
RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents
by: Xiao, Wenjie, et al.
Published: (2026)
by: Xiao, Wenjie, et al.
Published: (2026)
CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage
by: Mashnoor, Nowfel, et al.
Published: (2025)
by: Mashnoor, Nowfel, et al.
Published: (2025)
SWAT: A System-Wide Approach to Tunable Leakage Mitigation in Encrypted Data Stores
by: Zheng, Leqian, et al.
Published: (2023)
by: Zheng, Leqian, et al.
Published: (2023)
CipherGuard: Compiler-aided Mitigation against Ciphertext Side-channel Attacks
by: Jiang, Ke, et al.
Published: (2025)
by: Jiang, Ke, et al.
Published: (2025)
Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models
by: Chen, Mingjie, et al.
Published: (2025)
by: Chen, Mingjie, et al.
Published: (2025)
Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts
by: Hasan, Md. Mehedi, et al.
Published: (2025)
by: Hasan, Md. Mehedi, et al.
Published: (2025)
JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks
by: Zhang, Xiaoyu, et al.
Published: (2023)
by: Zhang, Xiaoyu, et al.
Published: (2023)
"Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust Watermarking
by: Zhang, Xinyu, et al.
Published: (2026)
by: Zhang, Xinyu, et al.
Published: (2026)
MindGuard: Intrinsic Decision Inspection for Securing LLM Agents Against Metadata Poisoning
by: Wang, Zhiqiang, et al.
Published: (2025)
by: Wang, Zhiqiang, et al.
Published: (2025)
RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection
by: Zhu, He, et al.
Published: (2026)
by: Zhu, He, et al.
Published: (2026)
When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion
by: Li, Jiaqing, et al.
Published: (2026)
by: Li, Jiaqing, et al.
Published: (2026)
AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations
by: He, Yu, et al.
Published: (2026)
by: He, Yu, et al.
Published: (2026)
LLM Security Guard for Code
by: Kavian, Arya, et al.
Published: (2024)
by: Kavian, Arya, et al.
Published: (2024)
PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
by: Shen, Guobin, et al.
Published: (2025)
by: Shen, Guobin, et al.
Published: (2025)
WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents
by: Chen, Yulin, et al.
Published: (2026)
by: Chen, Yulin, et al.
Published: (2026)
RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry
by: Lv, Bo, et al.
Published: (2026)
by: Lv, Bo, et al.
Published: (2026)
SAGE: Sample-Aware Guarding Engine for Robust Intrusion Detection Against Adversarial Attacks
by: Chen, Jing, et al.
Published: (2025)
by: Chen, Jing, et al.
Published: (2025)
GuardFS: a File System for Integrated Detection and Mitigation of Linux-based Ransomware
by: von der Assen, Jan, et al.
Published: (2024)
by: von der Assen, Jan, et al.
Published: (2024)
A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory
by: Wei, Qianshan, et al.
Published: (2025)
by: Wei, Qianshan, et al.
Published: (2025)
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection
by: Zhao, Wei, et al.
Published: (2026)
by: Zhao, Wei, et al.
Published: (2026)
Breaking Secure Aggregation: Label Leakage from Aggregated Gradients in Federated Learning
by: Wang, Zhibo, et al.
Published: (2024)
by: Wang, Zhibo, et al.
Published: (2024)
Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack
by: Jin, Zhibo, et al.
Published: (2024)
by: Jin, Zhibo, et al.
Published: (2024)
DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks
by: Zhu, Zhiyu, et al.
Published: (2024)
by: Zhu, Zhiyu, et al.
Published: (2024)
ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning
by: Guan, Shaowei, et al.
Published: (2025)
by: Guan, Shaowei, et al.
Published: (2025)
Grimlock: Guarding High-Agency Systems with eBPF and Attested Channels
by: Wu, Qiancheng, et al.
Published: (2026)
by: Wu, Qiancheng, et al.
Published: (2026)
Adversarial Threat Vectors and Risk Mitigation for Retrieval-Augmented Generation Systems
by: Ward, Chris M., et al.
Published: (2025)
by: Ward, Chris M., et al.
Published: (2025)
Similar Items
-
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
by: Zhang, Wenhui, et al.
Published: (2025) -
Interpretable LLM Guardrails via Sparse Representation Steering
by: He, Zeqing, et al.
Published: (2025) -
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
by: He, Zeqing, et al.
Published: (2024) -
LoopTrap: Termination Poisoning Attacks on LLM Agents
by: Xu, Huiyu, et al.
Published: (2026) -
PT-Mark: Invisible Watermarking for Text-to-image Diffusion Models via Semantic-aware Pivotal Tuning
by: Wang, Yaopeng, et al.
Published: (2025)