Saved in:
| Main Authors: | Ivry, Dror, Nahum, Oran |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.05446 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Paladin-mini: A Compact and Efficient Grounding Model Excelling in Real-World Scenarios
by: Ivry, Dror, et al.
Published: (2025)
by: Ivry, Dror, et al.
Published: (2025)
OET: Optimization-based prompt injection Evaluation Toolkit
by: Pan, Jinsheng, et al.
Published: (2025)
by: Pan, Jinsheng, et al.
Published: (2025)
Exfiltration of personal information from ChatGPT via prompt injection
by: Schwartzman, Gregory
Published: (2024)
by: Schwartzman, Gregory
Published: (2024)
CyberSentinel: An Emergent Threat Detection System for AI Security
by: Tallam, Krti
Published: (2025)
by: Tallam, Krti
Published: (2025)
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
by: Liu, Yupei, et al.
Published: (2025)
by: Liu, Yupei, et al.
Published: (2025)
Backdoor Sentinel: Detecting and Detoxifying Backdoors in Diffusion Models via Temporal Noise Consistency
by: Wang, Bingzheng, et al.
Published: (2026)
by: Wang, Bingzheng, et al.
Published: (2026)
SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection
by: Feng, Yang, et al.
Published: (2025)
by: Feng, Yang, et al.
Published: (2025)
DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern
by: Pang, Xiaoyi, et al.
Published: (2026)
by: Pang, Xiaoyi, et al.
Published: (2026)
Large Language Model Sentinel: LLM Agent for Adversarial Purification
by: Lin, Guang, et al.
Published: (2024)
by: Lin, Guang, et al.
Published: (2024)
WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents
by: Wang, Xilong, et al.
Published: (2026)
by: Wang, Xilong, et al.
Published: (2026)
The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models
by: Xu, Zhiyuan, et al.
Published: (2025)
by: Xu, Zhiyuan, et al.
Published: (2025)
Model Inversion Attack against Federated Unlearning
by: Zhou, Lei, et al.
Published: (2025)
by: Zhou, Lei, et al.
Published: (2025)
False Claims against Model Ownership Resolution
by: Liu, Jian, et al.
Published: (2023)
by: Liu, Jian, et al.
Published: (2023)
EVA: Editing for Versatile Alignment against Jailbreaks
by: Wang, Yi, et al.
Published: (2026)
by: Wang, Yi, et al.
Published: (2026)
QUEEN: Query Unlearning against Model Extraction
by: Chen, Huajie, et al.
Published: (2024)
by: Chen, Huajie, et al.
Published: (2024)
CSC: Turning the Adversary's Poison against Itself
by: Shi, Yuchen, et al.
Published: (2026)
by: Shi, Yuchen, et al.
Published: (2026)
Fooling LLM graders into giving better grades through neural activity guided adversarial prompting
by: Yamamura, Atsushi, et al.
Published: (2024)
by: Yamamura, Atsushi, et al.
Published: (2024)
STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models
by: Wang, Xunguang, et al.
Published: (2025)
by: Wang, Xunguang, et al.
Published: (2025)
Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing
by: Gao, ZhenZhe, et al.
Published: (2024)
by: Gao, ZhenZhe, et al.
Published: (2024)
Defending against Indirect Prompt Injection by Instruction Detection
by: Wen, Tongyu, et al.
Published: (2025)
by: Wen, Tongyu, et al.
Published: (2025)
Optimizing Adaptive Attacks against Watermarks for Language Models
by: Diaa, Abdulrahman, et al.
Published: (2024)
by: Diaa, Abdulrahman, et al.
Published: (2024)
Adversarial attacks against Modern Vision-Language Models
by: La Torre, Alejandro Paredes
Published: (2026)
by: La Torre, Alejandro Paredes
Published: (2026)
SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems
by: Patil, KrishnaSaiReddy
Published: (2026)
by: Patil, KrishnaSaiReddy
Published: (2026)
Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry
by: Torpmann-Hagen, Birk, et al.
Published: (2025)
by: Torpmann-Hagen, Birk, et al.
Published: (2025)
SDD: Self-Degraded Defense against Malicious Fine-tuning
by: Chen, Zixuan, et al.
Published: (2025)
by: Chen, Zixuan, et al.
Published: (2025)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models
by: Cheng, Xueqi, et al.
Published: (2025)
by: Cheng, Xueqi, et al.
Published: (2025)
A Critical Evaluation of Defenses against Prompt Injection Attacks
by: Jia, Yuqi, et al.
Published: (2025)
by: Jia, Yuqi, et al.
Published: (2025)
ShallowJail: Steering Jailbreaks against Large Language Models
by: Liu, Shang, et al.
Published: (2026)
by: Liu, Shang, et al.
Published: (2026)
COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
Integrating Identity-Based Identification against Adaptive Adversaries in Federated Learning
by: Szelag, Jakub Kacper, et al.
Published: (2025)
by: Szelag, Jakub Kacper, et al.
Published: (2025)
CUBA: Controlled Untargeted Backdoor Attack against Deep Neural Networks
by: Wu, Yinghao, et al.
Published: (2025)
by: Wu, Yinghao, et al.
Published: (2025)
SoK: Robustness in Large Language Models against Jailbreak Attacks
by: Xu, Feiyue, et al.
Published: (2026)
by: Xu, Feiyue, et al.
Published: (2026)
Quantifying and Defending against Privacy Threats on Federated Knowledge Graph Embedding
by: Hu, Yuke, et al.
Published: (2023)
by: Hu, Yuke, et al.
Published: (2023)
Semantic-level Backdoor Attack against Text-to-Image Diffusion Models
by: Chen, Tianxin, et al.
Published: (2026)
by: Chen, Tianxin, et al.
Published: (2026)
FedCC: Robust Federated Learning against Model Poisoning Attacks
by: Jeong, Hyejun, et al.
Published: (2022)
by: Jeong, Hyejun, et al.
Published: (2022)
Generating Is Believing: Membership Inference Attacks against Retrieval-Augmented Generation
by: Li, Yuying, et al.
Published: (2024)
by: Li, Yuying, et al.
Published: (2024)
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks
by: Cunningham, Hoagy, et al.
Published: (2026)
by: Cunningham, Hoagy, et al.
Published: (2026)
Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks
by: Xu, Yixiao, et al.
Published: (2025)
by: Xu, Yixiao, et al.
Published: (2025)
CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
by: Fu, Haowei, et al.
Published: (2025)
by: Fu, Haowei, et al.
Published: (2025)
Similar Items
-
Paladin-mini: A Compact and Efficient Grounding Model Excelling in Real-World Scenarios
by: Ivry, Dror, et al.
Published: (2025) -
OET: Optimization-based prompt injection Evaluation Toolkit
by: Pan, Jinsheng, et al.
Published: (2025) -
Exfiltration of personal information from ChatGPT via prompt injection
by: Schwartzman, Gregory
Published: (2024) -
CyberSentinel: An Emergent Threat Detection System for AI Security
by: Tallam, Krti
Published: (2025) -
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
by: Liu, Yupei, et al.
Published: (2025)