:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Kongxin, Zhang, Jie, Qi, Peigui, Tang, Kunsheng, Zhang, Tianwei, Zhou, Wenbo
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2508.02476
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
by: Qi, Peigui, et al.
Published: (2025)

Poly-Guard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset
by: Kang, Mintong, et al.
Published: (2025)

State-Dependent Safety Failures in Multi-Turn Language Model Interaction
by: Li, Pengcheng, et al.
Published: (2026)

Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks
by: Zhang, Minxing, et al.
Published: (2024)

ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
by: Zhao, Yunhan, et al.
Published: (2026)

Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection
by: Li, Shuai, et al.
Published: (2023)

Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures
by: Su, Yanghao, et al.
Published: (2026)

GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, Video, and Audio
by: Zhu, Zhenhao, et al.
Published: (2026)

CipherGuard: Compiler-aided Mitigation against Ciphertext Side-channel Attacks
by: Jiang, Ke, et al.
Published: (2025)

ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails
by: Wen, Xiaofei, et al.
Published: (2025)

BURN: Backdoor Unlearning via Adversarial Boundary Analysis
by: Su, Yanghao, et al.
Published: (2025)

AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
by: Feng, Weitao, et al.
Published: (2024)

On the Account Security Risks Posed by Password Strength Meters
by: Xu, Ming, et al.
Published: (2025)

From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI
by: Zhang, Zelin, et al.
Published: (2026)

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
by: Jin, Weifei, et al.
Published: (2025)

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety
by: Liu, Zhe, et al.
Published: (2026)

Bag of Tricks for Subverting Reasoning-based Safety Guardrails
by: Chen, Shuo, et al.
Published: (2025)

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
by: Li, Hao, et al.
Published: (2024)

Siren Song: Manipulating Pose Estimation in XR Headsets Using Acoustic Attacks
by: Huang, Zijian, et al.
Published: (2025)

OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
by: Zhu, Boyu, et al.
Published: (2025)

OneShield -- the Next Generation of LLM Guardrails
by: DeLuca, Chad, et al.
Published: (2025)

A Comparative Evaluation of AI Agent Security Guardrails
by: Li, Qi, et al.
Published: (2026)

InferDPT: Privacy-Preserving Inference for Closed-box Large Language Model
by: Tong, Meng, et al.
Published: (2023)

TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts
by: Chu, Hua-Rong, et al.
Published: (2026)

ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
by: Wang, Zihan, et al.
Published: (2025)

Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
by: Hu, Runyi, et al.
Published: (2024)

Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)

TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models
by: Guo, Zhen, et al.
Published: (2026)

Peering Behind the Shield: Guardrail Identification in Large Language Models
by: Yang, Ziqing, et al.
Published: (2025)

Investigating Threats Posed by SMS Origin Spoofing to IoT Devices
by: Tsunoda, Akaki
Published: (2023)

The Gradient Puppeteer: Adversarial Domination in Gradient Leakage Attacks through Model Poisoning
by: Xiang, Kunlan, et al.
Published: (2025)

JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks
by: Zhang, Xiaoyu, et al.
Published: (2023)

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models
by: Wang, Thomas, et al.
Published: (2025)

Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs
by: Suo, Pan, et al.
Published: (2025)

PSRT: Accelerating LRM-based Guard Models via Prefilled Safe Reasoning Traces
by: Zhao, Jiawei, et al.
Published: (2025)

GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy
by: Minko, Bogdan, et al.
Published: (2026)

Oedipus: LLM-enchanced Reasoning CAPTCHA Solver
by: Deng, Gelei, et al.
Published: (2024)

Interpretable LLM Guardrails via Sparse Representation Steering
by: He, Zeqing, et al.
Published: (2025)

Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
by: Deng, Gelei, et al.
Published: (2024)

SSD: A State-based Stealthy Backdoor Attack For Navigation System in UAV Route Planning
by: Wang, Zhaoxuan, et al.
Published: (2025)