:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Mouzouni, Charafeddine
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2604.04561
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration
by: Mouzouni, Charafeddine
Published: (2026)

GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents
by: Li, Xueyi, et al.
Published: (2026)

Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
by: Ahmed, Mohamed, et al.
Published: (2025)

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
by: Nihal, Ragib Amin, et al.
Published: (2025)

Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents
by: Nawal, Aditya, et al.
Published: (2026)

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
by: Ji, Zimo, et al.
Published: (2025)

BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
by: Tong, Terry, et al.
Published: (2025)

Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
by: Shen, Guangyu, et al.
Published: (2024)

LLM Agents can Autonomously Exploit One-day Vulnerabilities
by: Fang, Richard, et al.
Published: (2024)

Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization
by: Simoni, Marco, et al.
Published: (2025)

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)

DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
by: Li, Xirui, et al.
Published: (2024)

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
by: Sahoo, Devanshu, et al.
Published: (2025)

Contextualized Privacy Defense for LLM Agents
by: Wen, Yule, et al.
Published: (2026)

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges
by: Ding, Ruomeng, et al.
Published: (2026)

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
by: Yang, Wenkai, et al.
Published: (2024)

NeuroFilter: Privacy Guardrails for Conversational LLM Agents
by: Das, Saswat, et al.
Published: (2026)

Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
by: Das, Saswat, et al.
Published: (2025)

Searching for Privacy Risks in LLM Agents via Simulation
by: Zhang, Yanzhe, et al.
Published: (2025)

Exploring Backdoor Vulnerabilities of Chat Models
by: Hao, Yunzhuo, et al.
Published: (2024)

When Agents "Misremember" Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems
by: Xu, Naen, et al.
Published: (2026)

Large Language Model Sentinel: LLM Agent for Adversarial Purification
by: Lin, Guang, et al.
Published: (2024)

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy
by: Correia, Pedro H. Barcha, et al.
Published: (2026)

Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures
by: Benjamin, Victoria, et al.
Published: (2024)

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
by: Liu, Yi, et al.
Published: (2026)

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
by: Lee, Hyomin, et al.
Published: (2026)

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems
by: Wang, Liwen, et al.
Published: (2025)

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
by: Dong, Jianshuo, et al.
Published: (2025)

From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security
by: Basic, Enna, et al.
Published: (2024)

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
by: Qu, Yubin, et al.
Published: (2026)

No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents
by: Yang, Tiankai, et al.
Published: (2026)

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers
by: Zhao, Andrew, et al.
Published: (2025)

Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
by: Gibbs, Tom, et al.
Published: (2024)

MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory
by: Wang, Yuhui, et al.
Published: (2026)

IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
by: An, Hengyu, et al.
Published: (2025)

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
by: Zhao, Shuai, et al.
Published: (2024)

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
by: Fu, Tingchen, et al.
Published: (2024)

Code Vulnerability Detection Across Different Programming Languages with AI Models
by: Humran, Hael Abdulhakim Ali, et al.
Published: (2025)

In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
by: Shetty, Anudeex, et al.
Published: (2026)