Saved in:
| Main Authors: | Bhatt, Manish, Wood, Adrian, Habler, Idan, Al-Kahfah, Ammar |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.00042 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Manifold of Failure: Behavioral Attraction Basins in Language Models
by: Munshi, Sarthak, et al.
Published: (2026)
by: Munshi, Sarthak, et al.
Published: (2026)
The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?
by: Bhatt, Manish, et al.
Published: (2026)
by: Bhatt, Manish, et al.
Published: (2026)
ETDI: Mitigating Tool Squatting and Rug Pull Attacks in Model Context Protocol (MCP) by using OAuth-Enhanced Tool Definitions and Policy-Based Access Control
by: Bhatt, Manish, et al.
Published: (2025)
by: Bhatt, Manish, et al.
Published: (2025)
COALESCE: Economic and Security Dynamics of Skill-Based Task Outsourcing Among Team of Autonomous LLM Agents
by: Bhatt, Manish, et al.
Published: (2025)
by: Bhatt, Manish, et al.
Published: (2025)
MAIF: Enforcing AI Trust and Provenance with an Artifact-Centric Agentic Paradigm
by: Narajala, Vineeth Sai, et al.
Published: (2025)
by: Narajala, Vineeth Sai, et al.
Published: (2025)
Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies
by: Narajala, Vineeth Sai, et al.
Published: (2025)
by: Narajala, Vineeth Sai, et al.
Published: (2025)
Securing GenAI Multi-Agent Systems Against Tool Squatting: A Zero Trust Registry-Based Approach
by: Narajala, Vineeth Sai, et al.
Published: (2025)
by: Narajala, Vineeth Sai, et al.
Published: (2025)
Building A Secure Agentic AI Application Leveraging A2A Protocol
by: Habler, Idan, et al.
Published: (2025)
by: Habler, Idan, et al.
Published: (2025)
From Tool Orchestration to Code Execution: A Study of MCP Design Choices
by: Felendler, Yuval, et al.
Published: (2026)
by: Felendler, Yuval, et al.
Published: (2026)
From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming
by: Sinha, Anusha, et al.
Published: (2025)
by: Sinha, Anusha, et al.
Published: (2025)
Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems
by: Habler, Idan, et al.
Published: (2026)
by: Habler, Idan, et al.
Published: (2026)
Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models
by: Bhatt, Manish
Published: (2026)
by: Bhatt, Manish
Published: (2026)
Agent Capability Negotiation and Binding Protocol (ACNBP)
by: Huang, Ken, et al.
Published: (2025)
by: Huang, Ken, et al.
Published: (2025)
Red Teaming AI Red Teaming
by: Majumdar, Subhabrata, et al.
Published: (2025)
by: Majumdar, Subhabrata, et al.
Published: (2025)
Agent Name Service (ANS): A Universal Directory for Secure AI Agent Discovery and Interoperability
by: Huang, Ken, et al.
Published: (2025)
by: Huang, Ken, et al.
Published: (2025)
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
by: Rawat, Ambrish, et al.
Published: (2024)
by: Rawat, Ambrish, et al.
Published: (2024)
BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing
by: Kaplan, Caelin, et al.
Published: (2025)
by: Kaplan, Caelin, et al.
Published: (2025)
Red Teaming Large Reasoning Models
by: Chen, Jiawei, et al.
Published: (2025)
by: Chen, Jiawei, et al.
Published: (2025)
Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations
by: Wang, Cheng, et al.
Published: (2024)
by: Wang, Cheng, et al.
Published: (2024)
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
by: Liang, Jiacheng, et al.
Published: (2026)
by: Liang, Jiacheng, et al.
Published: (2026)
Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts
by: Liu, Yi, et al.
Published: (2024)
by: Liu, Yi, et al.
Published: (2024)
Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment
by: Xie, Yuchong, et al.
Published: (2025)
by: Xie, Yuchong, et al.
Published: (2025)
Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems
by: Atta, Hammad, et al.
Published: (2025)
by: Atta, Hammad, et al.
Published: (2025)
A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control
by: Huang, Ken, et al.
Published: (2025)
by: Huang, Ken, et al.
Published: (2025)
Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours
by: Dheekonda, Raja Sekhar Rao, et al.
Published: (2026)
by: Dheekonda, Raja Sekhar Rao, et al.
Published: (2026)
Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)
by: Zymet, Jesse, et al.
Published: (2026)
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
by: Zhang, Jiawei, et al.
Published: (2025)
by: Zhang, Jiawei, et al.
Published: (2025)
Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
by: Zhang, Jinchuan, et al.
Published: (2024)
by: Zhang, Jinchuan, et al.
Published: (2024)
Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models
by: Wirth, Manuel
Published: (2026)
by: Wirth, Manuel
Published: (2026)
DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints
by: Zhao, Andrew, et al.
Published: (2024)
by: Zhao, Andrew, et al.
Published: (2024)
Capability-Based Scaling Trends for LLM-Based Red-Teaming
by: Panfilov, Alexander, et al.
Published: (2025)
by: Panfilov, Alexander, et al.
Published: (2025)
A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications
by: Srivastava, Shruti, et al.
Published: (2026)
by: Srivastava, Shruti, et al.
Published: (2026)
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
by: Zhou, Andy, et al.
Published: (2025)
by: Zhou, Andy, et al.
Published: (2025)
Benchmark Early and Red Team Often: A Framework for Assessing and Managing Dual-Use Hazards of AI Foundation Models
by: Barrett, Anthony M., et al.
Published: (2024)
by: Barrett, Anthony M., et al.
Published: (2024)
Mind the Web: The Security of Web Use Agents
by: Shapira, Avishag, et al.
Published: (2025)
by: Shapira, Avishag, et al.
Published: (2025)
When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models
by: Ou, Haoran, et al.
Published: (2025)
by: Ou, Haoran, et al.
Published: (2025)
PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
by: Du, Pengfei
Published: (2025)
by: Du, Pengfei
Published: (2025)
Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks
by: Halloran, John T., et al.
Published: (2026)
by: Halloran, John T., et al.
Published: (2026)
Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis
by: Thornton, Scott
Published: (2026)
by: Thornton, Scott
Published: (2026)
Large Language Model Integration with Reinforcement Learning to Augment Decision-Making in Autonomous Cyber Operations
by: Tholl, Konur, et al.
Published: (2025)
by: Tholl, Konur, et al.
Published: (2025)
Similar Items
-
Manifold of Failure: Behavioral Attraction Basins in Language Models
by: Munshi, Sarthak, et al.
Published: (2026) -
The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?
by: Bhatt, Manish, et al.
Published: (2026) -
ETDI: Mitigating Tool Squatting and Rug Pull Attacks in Model Context Protocol (MCP) by using OAuth-Enhanced Tool Definitions and Policy-Based Access Control
by: Bhatt, Manish, et al.
Published: (2025) -
COALESCE: Economic and Security Dynamics of Skill-Based Task Outsourcing Among Team of Autonomous LLM Agents
by: Bhatt, Manish, et al.
Published: (2025) -
MAIF: Enforcing AI Trust and Provenance with an Artifact-Centric Agentic Paradigm
by: Narajala, Vineeth Sai, et al.
Published: (2025)