Saved in:
| Main Authors: | Wang, Kongxin, Zhang, Jie, Qi, Peigui, Tang, Kunsheng, Zhang, Tianwei, Zhou, Wenbo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.02476 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
by: Qi, Peigui, et al.
Published: (2025)
by: Qi, Peigui, et al.
Published: (2025)
Poly-Guard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset
by: Kang, Mintong, et al.
Published: (2025)
by: Kang, Mintong, et al.
Published: (2025)
State-Dependent Safety Failures in Multi-Turn Language Model Interaction
by: Li, Pengcheng, et al.
Published: (2026)
by: Li, Pengcheng, et al.
Published: (2026)
Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks
by: Zhang, Minxing, et al.
Published: (2024)
by: Zhang, Minxing, et al.
Published: (2024)
ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
by: Zhao, Yunhan, et al.
Published: (2026)
by: Zhao, Yunhan, et al.
Published: (2026)
Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection
by: Li, Shuai, et al.
Published: (2023)
by: Li, Shuai, et al.
Published: (2023)
Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures
by: Su, Yanghao, et al.
Published: (2026)
by: Su, Yanghao, et al.
Published: (2026)
GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, Video, and Audio
by: Zhu, Zhenhao, et al.
Published: (2026)
by: Zhu, Zhenhao, et al.
Published: (2026)
CipherGuard: Compiler-aided Mitigation against Ciphertext Side-channel Attacks
by: Jiang, Ke, et al.
Published: (2025)
by: Jiang, Ke, et al.
Published: (2025)
ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails
by: Wen, Xiaofei, et al.
Published: (2025)
by: Wen, Xiaofei, et al.
Published: (2025)
BURN: Backdoor Unlearning via Adversarial Boundary Analysis
by: Su, Yanghao, et al.
Published: (2025)
by: Su, Yanghao, et al.
Published: (2025)
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
by: Feng, Weitao, et al.
Published: (2024)
by: Feng, Weitao, et al.
Published: (2024)
On the Account Security Risks Posed by Password Strength Meters
by: Xu, Ming, et al.
Published: (2025)
by: Xu, Ming, et al.
Published: (2025)
From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI
by: Zhang, Zelin, et al.
Published: (2026)
by: Zhang, Zelin, et al.
Published: (2026)
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
by: Jin, Weifei, et al.
Published: (2025)
by: Jin, Weifei, et al.
Published: (2025)
SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety
by: Liu, Zhe, et al.
Published: (2026)
by: Liu, Zhe, et al.
Published: (2026)
Bag of Tricks for Subverting Reasoning-based Safety Guardrails
by: Chen, Shuo, et al.
Published: (2025)
by: Chen, Shuo, et al.
Published: (2025)
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Siren Song: Manipulating Pose Estimation in XR Headsets Using Acoustic Attacks
by: Huang, Zijian, et al.
Published: (2025)
by: Huang, Zijian, et al.
Published: (2025)
OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
by: Zhu, Boyu, et al.
Published: (2025)
by: Zhu, Boyu, et al.
Published: (2025)
OneShield -- the Next Generation of LLM Guardrails
by: DeLuca, Chad, et al.
Published: (2025)
by: DeLuca, Chad, et al.
Published: (2025)
A Comparative Evaluation of AI Agent Security Guardrails
by: Li, Qi, et al.
Published: (2026)
by: Li, Qi, et al.
Published: (2026)
InferDPT: Privacy-Preserving Inference for Closed-box Large Language Model
by: Tong, Meng, et al.
Published: (2023)
by: Tong, Meng, et al.
Published: (2023)
TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts
by: Chu, Hua-Rong, et al.
Published: (2026)
by: Chu, Hua-Rong, et al.
Published: (2026)
ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
by: Hu, Runyi, et al.
Published: (2024)
by: Hu, Runyi, et al.
Published: (2024)
Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)
by: Wu, Benlong, et al.
Published: (2026)
TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models
by: Guo, Zhen, et al.
Published: (2026)
by: Guo, Zhen, et al.
Published: (2026)
Peering Behind the Shield: Guardrail Identification in Large Language Models
by: Yang, Ziqing, et al.
Published: (2025)
by: Yang, Ziqing, et al.
Published: (2025)
Investigating Threats Posed by SMS Origin Spoofing to IoT Devices
by: Tsunoda, Akaki
Published: (2023)
by: Tsunoda, Akaki
Published: (2023)
The Gradient Puppeteer: Adversarial Domination in Gradient Leakage Attacks through Model Poisoning
by: Xiang, Kunlan, et al.
Published: (2025)
by: Xiang, Kunlan, et al.
Published: (2025)
JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks
by: Zhang, Xiaoyu, et al.
Published: (2023)
by: Zhang, Xiaoyu, et al.
Published: (2023)
OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models
by: Wang, Thomas, et al.
Published: (2025)
by: Wang, Thomas, et al.
Published: (2025)
Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs
by: Suo, Pan, et al.
Published: (2025)
by: Suo, Pan, et al.
Published: (2025)
PSRT: Accelerating LRM-based Guard Models via Prefilled Safe Reasoning Traces
by: Zhao, Jiawei, et al.
Published: (2025)
by: Zhao, Jiawei, et al.
Published: (2025)
GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy
by: Minko, Bogdan, et al.
Published: (2026)
by: Minko, Bogdan, et al.
Published: (2026)
Oedipus: LLM-enchanced Reasoning CAPTCHA Solver
by: Deng, Gelei, et al.
Published: (2024)
by: Deng, Gelei, et al.
Published: (2024)
Interpretable LLM Guardrails via Sparse Representation Steering
by: He, Zeqing, et al.
Published: (2025)
by: He, Zeqing, et al.
Published: (2025)
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
by: Deng, Gelei, et al.
Published: (2024)
by: Deng, Gelei, et al.
Published: (2024)
SSD: A State-based Stealthy Backdoor Attack For Navigation System in UAV Route Planning
by: Wang, Zhaoxuan, et al.
Published: (2025)
by: Wang, Zhaoxuan, et al.
Published: (2025)
Similar Items
-
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
by: Qi, Peigui, et al.
Published: (2025) -
Poly-Guard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset
by: Kang, Mintong, et al.
Published: (2025) -
State-Dependent Safety Failures in Multi-Turn Language Model Interaction
by: Li, Pengcheng, et al.
Published: (2026) -
Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks
by: Zhang, Minxing, et al.
Published: (2024) -
ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
by: Zhao, Yunhan, et al.
Published: (2026)