:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bhatt, Manish, Wood, Adrian, Habler, Idan, Al-Kahfah, Ammar
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2601.00042
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Manifold of Failure: Behavioral Attraction Basins in Language Models
by: Munshi, Sarthak, et al.
Published: (2026)

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?
by: Bhatt, Manish, et al.
Published: (2026)

ETDI: Mitigating Tool Squatting and Rug Pull Attacks in Model Context Protocol (MCP) by using OAuth-Enhanced Tool Definitions and Policy-Based Access Control
by: Bhatt, Manish, et al.
Published: (2025)

COALESCE: Economic and Security Dynamics of Skill-Based Task Outsourcing Among Team of Autonomous LLM Agents
by: Bhatt, Manish, et al.
Published: (2025)

MAIF: Enforcing AI Trust and Provenance with an Artifact-Centric Agentic Paradigm
by: Narajala, Vineeth Sai, et al.
Published: (2025)

Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies
by: Narajala, Vineeth Sai, et al.
Published: (2025)

Securing GenAI Multi-Agent Systems Against Tool Squatting: A Zero Trust Registry-Based Approach
by: Narajala, Vineeth Sai, et al.
Published: (2025)

Building A Secure Agentic AI Application Leveraging A2A Protocol
by: Habler, Idan, et al.
Published: (2025)

From Tool Orchestration to Code Execution: A Study of MCP Design Choices
by: Felendler, Yuval, et al.
Published: (2026)

From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming
by: Sinha, Anusha, et al.
Published: (2025)

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems
by: Habler, Idan, et al.
Published: (2026)

Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models
by: Bhatt, Manish
Published: (2026)

Agent Capability Negotiation and Binding Protocol (ACNBP)
by: Huang, Ken, et al.
Published: (2025)

Red Teaming AI Red Teaming
by: Majumdar, Subhabrata, et al.
Published: (2025)

Agent Name Service (ANS): A Universal Directory for Secure AI Agent Discovery and Interoperability
by: Huang, Ken, et al.
Published: (2025)

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
by: Rawat, Ambrish, et al.
Published: (2024)

BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing
by: Kaplan, Caelin, et al.
Published: (2025)

Red Teaming Large Reasoning Models
by: Chen, Jiawei, et al.
Published: (2025)

Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations
by: Wang, Cheng, et al.
Published: (2024)

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
by: Liang, Jiacheng, et al.
Published: (2026)

Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts
by: Liu, Yi, et al.
Published: (2024)

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment
by: Xie, Yuchong, et al.
Published: (2025)

Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems
by: Atta, Hammad, et al.
Published: (2025)

A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control
by: Huang, Ken, et al.
Published: (2025)

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours
by: Dheekonda, Raja Sekhar Rao, et al.
Published: (2026)

Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)

UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
by: Zhang, Jiawei, et al.
Published: (2025)

Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
by: Zhang, Jinchuan, et al.
Published: (2024)

Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models
by: Wirth, Manuel
Published: (2026)

DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints
by: Zhao, Andrew, et al.
Published: (2024)

Capability-Based Scaling Trends for LLM-Based Red-Teaming
by: Panfilov, Alexander, et al.
Published: (2025)

A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications
by: Srivastava, Shruti, et al.
Published: (2026)

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
by: Zhou, Andy, et al.
Published: (2025)

Benchmark Early and Red Team Often: A Framework for Assessing and Managing Dual-Use Hazards of AI Foundation Models
by: Barrett, Anthony M., et al.
Published: (2024)

Mind the Web: The Security of Web Use Agents
by: Shapira, Avishag, et al.
Published: (2025)

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models
by: Ou, Haoran, et al.
Published: (2025)

PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
by: Du, Pengfei
Published: (2025)

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks
by: Halloran, John T., et al.
Published: (2026)

Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis
by: Thornton, Scott
Published: (2026)

Large Language Model Integration with Reinforcement Learning to Augment Decision-Making in Autonomous Cyber Operations
by: Tholl, Konur, et al.
Published: (2025)