Saved in:
| Main Authors: | Chen, Tianyu, Lou, Jian, Wang, Wenjie |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.10030 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region
by: Leong, Chak Tou, et al.
Published: (2025)
by: Leong, Chak Tou, et al.
Published: (2025)
CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models
by: Xu, Naen, et al.
Published: (2024)
by: Xu, Naen, et al.
Published: (2024)
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries
by: Wang, Yuhao, et al.
Published: (2025)
by: Wang, Yuhao, et al.
Published: (2025)
Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation
by: Gao, Yilan, et al.
Published: (2026)
by: Gao, Yilan, et al.
Published: (2026)
TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment
by: Li, Qinfeng, et al.
Published: (2024)
by: Li, Qinfeng, et al.
Published: (2024)
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting
by: Wang, Yu, et al.
Published: (2024)
by: Wang, Yu, et al.
Published: (2024)
LLM Safeguard is a Double-Edged Sword: Exploiting False Positives for Denial-of-Service Attacks
by: Zhang, Qingzhao, et al.
Published: (2024)
by: Zhang, Qingzhao, et al.
Published: (2024)
A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
by: Chen, Tianyu, et al.
Published: (2026)
by: Chen, Tianyu, et al.
Published: (2026)
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
by: Gao, Jianbo, et al.
Published: (2025)
by: Gao, Jianbo, et al.
Published: (2025)
P$^2$RAG: Efficient Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval
by: Ming, Yulong, et al.
Published: (2026)
by: Ming, Yulong, et al.
Published: (2026)
An AI Agent Execution Environment to Safeguard User Data
by: Stanley, Robert, et al.
Published: (2026)
by: Stanley, Robert, et al.
Published: (2026)
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
by: Liu, Yue, et al.
Published: (2025)
by: Liu, Yue, et al.
Published: (2025)
Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval
by: Zhou, Pengcheng, et al.
Published: (2025)
by: Zhou, Pengcheng, et al.
Published: (2025)
On the Evidentiary Limits of Membership Inference for Copyright Auditing
by: Ertan, Murat Bilgehan, et al.
Published: (2026)
by: Ertan, Murat Bilgehan, et al.
Published: (2026)
Exploring and Developing a Pre-Model Safeguard with Draft Models
by: Cai, Hongyu, et al.
Published: (2026)
by: Cai, Hongyu, et al.
Published: (2026)
On Evaluating the Durability of Safeguards for Open-Weight LLMs
by: Qi, Xiangyu, et al.
Published: (2024)
by: Qi, Xiangyu, et al.
Published: (2024)
Safeguarding Large Language Models: A Survey
by: Dong, Yi, et al.
Published: (2024)
by: Dong, Yi, et al.
Published: (2024)
PromptKeeper: Safeguarding System Prompts for LLMs
by: Jiang, Zhifeng, et al.
Published: (2024)
by: Jiang, Zhifeng, et al.
Published: (2024)
RTLMarker: Protecting LLM-Generated RTL Copyright via a Hardware Watermarking Framework
by: Wang, Kun, et al.
Published: (2025)
by: Wang, Kun, et al.
Published: (2025)
Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control
by: Yu, Zhe, et al.
Published: (2026)
by: Yu, Zhe, et al.
Published: (2026)
PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation
by: Wang, Baiqiang, et al.
Published: (2025)
by: Wang, Baiqiang, et al.
Published: (2025)
Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?
by: Xu, Naen, et al.
Published: (2025)
by: Xu, Naen, et al.
Published: (2025)
Safeguarding Federated Learning-based Road Condition Classification
by: Liu, Sheng, et al.
Published: (2025)
by: Liu, Sheng, et al.
Published: (2025)
Safeguarding AI Agents: Developing and Analyzing Safety Architectures
by: Domkundwar, Ishaan, et al.
Published: (2024)
by: Domkundwar, Ishaan, et al.
Published: (2024)
Re-Triggering Safeguards within LLMs for Jailbreak Detection
by: Lin, Zheng, et al.
Published: (2026)
by: Lin, Zheng, et al.
Published: (2026)
Shattering the Echo Chamber: Hidden Safeguards in Manuscripts Against the AI Takeover of Peer Review
by: Ma, Oubo, et al.
Published: (2026)
by: Ma, Oubo, et al.
Published: (2026)
Do Multimodal RAG Systems Leak Data? A Comprehensive Evaluation of Membership Inference and Image Caption Retrieval Attacks
by: Al-Lawati, Ali, et al.
Published: (2026)
by: Al-Lawati, Ali, et al.
Published: (2026)
Embedding with Large Language Models for Classification of HIPAA Safeguard Compliance Rules
by: Rahman, Md Abdur, et al.
Published: (2024)
by: Rahman, Md Abdur, et al.
Published: (2024)
Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging
by: Li, Qinfeng, et al.
Published: (2025)
by: Li, Qinfeng, et al.
Published: (2025)
CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks
by: Li, Siyuan, et al.
Published: (2026)
by: Li, Siyuan, et al.
Published: (2026)
Reflect-Guard: Enhancing LLM Safeguards against Adversarial Prompts via Logical Self-Reflection
by: Lin, Lixing, et al.
Published: (2026)
by: Lin, Lixing, et al.
Published: (2026)
ME: Trigger Element Combination Backdoor Attack on Copyright Infringement
by: Yang, Feiyu, et al.
Published: (2025)
by: Yang, Feiyu, et al.
Published: (2025)
SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models
by: Zhang, Jiawen, et al.
Published: (2025)
by: Zhang, Jiawen, et al.
Published: (2025)
ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety
by: Wang, Kun, et al.
Published: (2026)
by: Wang, Kun, et al.
Published: (2026)
CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
Deep Learning-based Dual Watermarking for Image Copyright Protection and Authentication
by: Padhi, Sudev Kumar, et al.
Published: (2025)
by: Padhi, Sudev Kumar, et al.
Published: (2025)
MCP Guardian: A Security-First Layer for Safeguarding MCP-Based AI System
by: Kumar, Sonu, et al.
Published: (2025)
by: Kumar, Sonu, et al.
Published: (2025)
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
by: Shen, Zeyu, et al.
Published: (2025)
by: Shen, Zeyu, et al.
Published: (2025)
The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline
by: Wang, Haonan, et al.
Published: (2024)
by: Wang, Haonan, et al.
Published: (2024)
SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection
by: Feng, Yang, et al.
Published: (2025)
by: Feng, Yang, et al.
Published: (2025)
Similar Items
-
Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region
by: Leong, Chak Tou, et al.
Published: (2025) -
CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models
by: Xu, Naen, et al.
Published: (2024) -
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries
by: Wang, Yuhao, et al.
Published: (2025) -
Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation
by: Gao, Yilan, et al.
Published: (2026) -
TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment
by: Li, Qinfeng, et al.
Published: (2024)