:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ivry, Dror, Nahum, Oran
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.05446
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Paladin-mini: A Compact and Efficient Grounding Model Excelling in Real-World Scenarios
by: Ivry, Dror, et al.
Published: (2025)

OET: Optimization-based prompt injection Evaluation Toolkit
by: Pan, Jinsheng, et al.
Published: (2025)

Exfiltration of personal information from ChatGPT via prompt injection
by: Schwartzman, Gregory
Published: (2024)

CyberSentinel: An Emergent Threat Detection System for AI Security
by: Tallam, Krti
Published: (2025)

DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
by: Liu, Yupei, et al.
Published: (2025)

Backdoor Sentinel: Detecting and Detoxifying Backdoors in Diffusion Models via Temporal Noise Consistency
by: Wang, Bingzheng, et al.
Published: (2026)

SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection
by: Feng, Yang, et al.
Published: (2025)

DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern
by: Pang, Xiaoyi, et al.
Published: (2026)

Large Language Model Sentinel: LLM Agent for Adversarial Purification
by: Lin, Guang, et al.
Published: (2024)

WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents
by: Wang, Xilong, et al.
Published: (2026)

The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models
by: Xu, Zhiyuan, et al.
Published: (2025)

Model Inversion Attack against Federated Unlearning
by: Zhou, Lei, et al.
Published: (2025)

False Claims against Model Ownership Resolution
by: Liu, Jian, et al.
Published: (2023)

EVA: Editing for Versatile Alignment against Jailbreaks
by: Wang, Yi, et al.
Published: (2026)

QUEEN: Query Unlearning against Model Extraction
by: Chen, Huajie, et al.
Published: (2024)

CSC: Turning the Adversary's Poison against Itself
by: Shi, Yuchen, et al.
Published: (2026)

Fooling LLM graders into giving better grades through neural activity guided adversarial prompting
by: Yamamura, Atsushi, et al.
Published: (2024)

STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models
by: Wang, Xunguang, et al.
Published: (2025)

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing
by: Gao, ZhenZhe, et al.
Published: (2024)

Defending against Indirect Prompt Injection by Instruction Detection
by: Wen, Tongyu, et al.
Published: (2025)

Optimizing Adaptive Attacks against Watermarks for Language Models
by: Diaa, Abdulrahman, et al.
Published: (2024)

Adversarial attacks against Modern Vision-Language Models
by: La Torre, Alejandro Paredes
Published: (2026)

SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems
by: Patil, KrishnaSaiReddy
Published: (2026)

Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry
by: Torpmann-Hagen, Birk, et al.
Published: (2025)

SDD: Self-Degraded Defense against Malicious Fine-tuning
by: Chen, Zixuan, et al.
Published: (2025)

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models
by: Cheng, Xueqi, et al.
Published: (2025)

A Critical Evaluation of Defenses against Prompt Injection Attacks
by: Jia, Yuqi, et al.
Published: (2025)

ShallowJail: Steering Jailbreaks against Large Language Models
by: Liu, Shang, et al.
Published: (2026)

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers
by: Wang, Junyu, et al.
Published: (2025)

Integrating Identity-Based Identification against Adaptive Adversaries in Federated Learning
by: Szelag, Jakub Kacper, et al.
Published: (2025)

CUBA: Controlled Untargeted Backdoor Attack against Deep Neural Networks
by: Wu, Yinghao, et al.
Published: (2025)

SoK: Robustness in Large Language Models against Jailbreak Attacks
by: Xu, Feiyue, et al.
Published: (2026)

Quantifying and Defending against Privacy Threats on Federated Knowledge Graph Embedding
by: Hu, Yuke, et al.
Published: (2023)

Semantic-level Backdoor Attack against Text-to-Image Diffusion Models
by: Chen, Tianxin, et al.
Published: (2026)

FedCC: Robust Federated Learning against Model Poisoning Attacks
by: Jeong, Hyejun, et al.
Published: (2022)

Generating Is Believing: Membership Inference Attacks against Retrieval-Augmented Generation
by: Li, Yuying, et al.
Published: (2024)

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks
by: Cunningham, Hoagy, et al.
Published: (2026)

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks
by: Xu, Yixiao, et al.
Published: (2025)

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
by: Zhang, Xu, et al.
Published: (2025)

Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
by: Fu, Haowei, et al.
Published: (2025)