Saved in:
| Main Author: | Mouzouni, Charafeddine |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.04561 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration
by: Mouzouni, Charafeddine
Published: (2026)
by: Mouzouni, Charafeddine
Published: (2026)
GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents
by: Li, Xueyi, et al.
Published: (2026)
by: Li, Xueyi, et al.
Published: (2026)
Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
by: Ahmed, Mohamed, et al.
Published: (2025)
by: Ahmed, Mohamed, et al.
Published: (2025)
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
by: Nihal, Ragib Amin, et al.
Published: (2025)
by: Nihal, Ragib Amin, et al.
Published: (2025)
Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents
by: Nawal, Aditya, et al.
Published: (2026)
by: Nawal, Aditya, et al.
Published: (2026)
Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
by: Ji, Zimo, et al.
Published: (2025)
by: Ji, Zimo, et al.
Published: (2025)
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
by: Tong, Terry, et al.
Published: (2025)
by: Tong, Terry, et al.
Published: (2025)
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
by: Shen, Guangyu, et al.
Published: (2024)
by: Shen, Guangyu, et al.
Published: (2024)
LLM Agents can Autonomously Exploit One-day Vulnerabilities
by: Fang, Richard, et al.
Published: (2024)
by: Fang, Richard, et al.
Published: (2024)
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization
by: Simoni, Marco, et al.
Published: (2025)
by: Simoni, Marco, et al.
Published: (2025)
Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)
by: Jin, Xisen, et al.
Published: (2026)
DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
by: Li, Xirui, et al.
Published: (2024)
by: Li, Xirui, et al.
Published: (2024)
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
by: Sahoo, Devanshu, et al.
Published: (2025)
by: Sahoo, Devanshu, et al.
Published: (2025)
Contextualized Privacy Defense for LLM Agents
by: Wen, Yule, et al.
Published: (2026)
by: Wen, Yule, et al.
Published: (2026)
Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges
by: Ding, Ruomeng, et al.
Published: (2026)
by: Ding, Ruomeng, et al.
Published: (2026)
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
by: Yang, Wenkai, et al.
Published: (2024)
by: Yang, Wenkai, et al.
Published: (2024)
NeuroFilter: Privacy Guardrails for Conversational LLM Agents
by: Das, Saswat, et al.
Published: (2026)
by: Das, Saswat, et al.
Published: (2026)
Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
by: Das, Saswat, et al.
Published: (2025)
by: Das, Saswat, et al.
Published: (2025)
Searching for Privacy Risks in LLM Agents via Simulation
by: Zhang, Yanzhe, et al.
Published: (2025)
by: Zhang, Yanzhe, et al.
Published: (2025)
Exploring Backdoor Vulnerabilities of Chat Models
by: Hao, Yunzhuo, et al.
Published: (2024)
by: Hao, Yunzhuo, et al.
Published: (2024)
When Agents "Misremember" Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems
by: Xu, Naen, et al.
Published: (2026)
by: Xu, Naen, et al.
Published: (2026)
Large Language Model Sentinel: LLM Agent for Adversarial Purification
by: Lin, Guang, et al.
Published: (2024)
by: Lin, Guang, et al.
Published: (2024)
A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy
by: Correia, Pedro H. Barcha, et al.
Published: (2026)
by: Correia, Pedro H. Barcha, et al.
Published: (2026)
Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures
by: Benjamin, Victoria, et al.
Published: (2024)
by: Benjamin, Victoria, et al.
Published: (2024)
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
by: Liu, Yi, et al.
Published: (2026)
by: Liu, Yi, et al.
Published: (2026)
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
by: Lee, Hyomin, et al.
Published: (2026)
by: Lee, Hyomin, et al.
Published: (2026)
IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems
by: Wang, Liwen, et al.
Published: (2025)
by: Wang, Liwen, et al.
Published: (2025)
SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
by: Dong, Jianshuo, et al.
Published: (2025)
by: Dong, Jianshuo, et al.
Published: (2025)
From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security
by: Basic, Enna, et al.
Published: (2024)
by: Basic, Enna, et al.
Published: (2024)
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
by: Qu, Yubin, et al.
Published: (2026)
by: Qu, Yubin, et al.
Published: (2026)
No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents
by: Yang, Tiankai, et al.
Published: (2026)
by: Yang, Tiankai, et al.
Published: (2026)
Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers
by: Zhao, Andrew, et al.
Published: (2025)
by: Zhao, Andrew, et al.
Published: (2025)
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
by: Gibbs, Tom, et al.
Published: (2024)
by: Gibbs, Tom, et al.
Published: (2024)
MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory
by: Wang, Yuhui, et al.
Published: (2026)
by: Wang, Yuhui, et al.
Published: (2026)
IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
by: An, Hengyu, et al.
Published: (2025)
by: An, Hengyu, et al.
Published: (2025)
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)
by: Zhou, Kaiwen, et al.
Published: (2025)
Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
by: Zhao, Shuai, et al.
Published: (2024)
by: Zhao, Shuai, et al.
Published: (2024)
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
by: Fu, Tingchen, et al.
Published: (2024)
by: Fu, Tingchen, et al.
Published: (2024)
Code Vulnerability Detection Across Different Programming Languages with AI Models
by: Humran, Hael Abdulhakim Ali, et al.
Published: (2025)
by: Humran, Hael Abdulhakim Ali, et al.
Published: (2025)
In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
by: Shetty, Anudeex, et al.
Published: (2026)
by: Shetty, Anudeex, et al.
Published: (2026)
Similar Items
-
Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration
by: Mouzouni, Charafeddine
Published: (2026) -
GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents
by: Li, Xueyi, et al.
Published: (2026) -
Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
by: Ahmed, Mohamed, et al.
Published: (2025) -
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
by: Nihal, Ragib Amin, et al.
Published: (2025) -
Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents
by: Nawal, Aditya, et al.
Published: (2026)