:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Tianyu, Lou, Jian, Wang, Wenjie
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.10030
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region
by: Leong, Chak Tou, et al.
Published: (2025)

CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models
by: Xu, Naen, et al.
Published: (2024)

Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries
by: Wang, Yuhao, et al.
Published: (2025)

Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation
by: Gao, Yilan, et al.
Published: (2026)

TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment
by: Li, Qinfeng, et al.
Published: (2024)

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting
by: Wang, Yu, et al.
Published: (2024)

LLM Safeguard is a Double-Edged Sword: Exploiting False Positives for Denial-of-Service Attacks
by: Zhang, Qingzhao, et al.
Published: (2024)

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
by: Chen, Tianyu, et al.
Published: (2026)

AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
by: Gao, Jianbo, et al.
Published: (2025)

P$^2$RAG: Efficient Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval
by: Ming, Yulong, et al.
Published: (2026)

An AI Agent Execution Environment to Safeguard User Data
by: Stanley, Robert, et al.
Published: (2026)

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
by: Liu, Yue, et al.
Published: (2025)

Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval
by: Zhou, Pengcheng, et al.
Published: (2025)

On the Evidentiary Limits of Membership Inference for Copyright Auditing
by: Ertan, Murat Bilgehan, et al.
Published: (2026)

Exploring and Developing a Pre-Model Safeguard with Draft Models
by: Cai, Hongyu, et al.
Published: (2026)

On Evaluating the Durability of Safeguards for Open-Weight LLMs
by: Qi, Xiangyu, et al.
Published: (2024)

Safeguarding Large Language Models: A Survey
by: Dong, Yi, et al.
Published: (2024)

PromptKeeper: Safeguarding System Prompts for LLMs
by: Jiang, Zhifeng, et al.
Published: (2024)

RTLMarker: Protecting LLM-Generated RTL Copyright via a Hardware Watermarking Framework
by: Wang, Kun, et al.
Published: (2025)

Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control
by: Yu, Zhe, et al.
Published: (2026)

PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation
by: Wang, Baiqiang, et al.
Published: (2025)

Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?
by: Xu, Naen, et al.
Published: (2025)

Safeguarding Federated Learning-based Road Condition Classification
by: Liu, Sheng, et al.
Published: (2025)

Safeguarding AI Agents: Developing and Analyzing Safety Architectures
by: Domkundwar, Ishaan, et al.
Published: (2024)

Re-Triggering Safeguards within LLMs for Jailbreak Detection
by: Lin, Zheng, et al.
Published: (2026)

Shattering the Echo Chamber: Hidden Safeguards in Manuscripts Against the AI Takeover of Peer Review
by: Ma, Oubo, et al.
Published: (2026)

Do Multimodal RAG Systems Leak Data? A Comprehensive Evaluation of Membership Inference and Image Caption Retrieval Attacks
by: Al-Lawati, Ali, et al.
Published: (2026)

Embedding with Large Language Models for Classification of HIPAA Safeguard Compliance Rules
by: Rahman, Md Abdur, et al.
Published: (2024)

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging
by: Li, Qinfeng, et al.
Published: (2025)

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks
by: Li, Siyuan, et al.
Published: (2026)

Reflect-Guard: Enhancing LLM Safeguards against Adversarial Prompts via Logical Self-Reflection
by: Lin, Lixing, et al.
Published: (2026)

ME: Trigger Element Combination Backdoor Attack on Copyright Infringement
by: Yang, Feiyu, et al.
Published: (2025)

SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models
by: Zhang, Jiawen, et al.
Published: (2025)

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety
by: Wang, Kun, et al.
Published: (2026)

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
by: Zhang, Xu, et al.
Published: (2025)

Deep Learning-based Dual Watermarking for Image Copyright Protection and Authentication
by: Padhi, Sudev Kumar, et al.
Published: (2025)

MCP Guardian: A Security-First Layer for Safeguarding MCP-Based AI System
by: Kumar, Sonu, et al.
Published: (2025)

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
by: Shen, Zeyu, et al.
Published: (2025)

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline
by: Wang, Haonan, et al.
Published: (2024)

SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection
by: Feng, Yang, et al.
Published: (2025)