Saved in:
| Main Authors: | Hackett, William, Birch, Lewis, Trawicki, Stefan, Suri, Neeraj, Garraghan, Peter |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.11168 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
by: Young, Richard J.
Published: (2025)
by: Young, Richard J.
Published: (2025)
Detecting Prompt Injection Attacks Against Application Using Classifiers
by: Shaheer, Safwan, et al.
Published: (2025)
by: Shaheer, Safwan, et al.
Published: (2025)
Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks
by: Shaheer, Safwan, et al.
Published: (2025)
by: Shaheer, Safwan, et al.
Published: (2025)
Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems
by: Pai, Aaditya
Published: (2026)
by: Pai, Aaditya
Published: (2026)
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?
by: Xin, Yuan, et al.
Published: (2025)
by: Xin, Yuan, et al.
Published: (2025)
PromptSAM+: Malware Detection based on Prompt Segment Anything Model
by: Wei, Xingyuan, et al.
Published: (2024)
by: Wei, Xingyuan, et al.
Published: (2024)
Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously
by: Adiletta, Andrew, et al.
Published: (2025)
by: Adiletta, Andrew, et al.
Published: (2025)
Defending against Backdoor Attacks via Module Switching
by: Li, Weijun, et al.
Published: (2025)
by: Li, Weijun, et al.
Published: (2025)
Mitigating Trojanized Prompt Chains in Educational LLM Use Cases: Experimental Findings and Detection Tool Design
by: Charles, Richard M., et al.
Published: (2025)
by: Charles, Richard M., et al.
Published: (2025)
How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks
by: Wang, Yanshu, et al.
Published: (2026)
by: Wang, Yanshu, et al.
Published: (2026)
Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models
by: Del Rosario, Ron F.
Published: (2025)
by: Del Rosario, Ron F.
Published: (2025)
Prompted Contextual Vectors for Spear-Phishing Detection
by: Nahmias, Daniel, et al.
Published: (2024)
by: Nahmias, Daniel, et al.
Published: (2024)
ConfusionPrompt: Practical Private Inference for Online Large Language Models
by: Mai, Peihua, et al.
Published: (2023)
by: Mai, Peihua, et al.
Published: (2023)
UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation
by: Geng, Runpeng, et al.
Published: (2025)
by: Geng, Runpeng, et al.
Published: (2025)
Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache
by: Wang, Xinhai, et al.
Published: (2026)
by: Wang, Xinhai, et al.
Published: (2026)
Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents
by: Patlan, Atharv Singh, et al.
Published: (2025)
by: Patlan, Atharv Singh, et al.
Published: (2025)
DWFS-Obfuscation: Dynamic Weighted Feature Selection for Robust Malware Familial Classification under Obfuscation
by: Wei, Xingyuan, et al.
Published: (2025)
by: Wei, Xingyuan, et al.
Published: (2025)
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models
by: Shao, Yangguang, et al.
Published: (2025)
by: Shao, Yangguang, et al.
Published: (2025)
sudoLLM: On Multi-role Alignment of Language Models
by: Saha, Soumadeep, et al.
Published: (2025)
by: Saha, Soumadeep, et al.
Published: (2025)
Efficient LLM Safety Evaluation through Multi-Agent Debate
by: Lin, Dachuan, et al.
Published: (2025)
by: Lin, Dachuan, et al.
Published: (2025)
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
by: Wang, Haochuan Kevin, et al.
Published: (2026)
by: Wang, Haochuan Kevin, et al.
Published: (2026)
GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing
by: Zhang, Peiyan, et al.
Published: (2025)
by: Zhang, Peiyan, et al.
Published: (2025)
AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification
by: Zhang, Tian, et al.
Published: (2026)
by: Zhang, Tian, et al.
Published: (2026)
Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks
by: Kurian, Ashley, et al.
Published: (2025)
by: Kurian, Ashley, et al.
Published: (2025)
Reducing Information Overload: Because Even Security Experts Need to Blink
by: Kuehn, Philipp, et al.
Published: (2022)
by: Kuehn, Philipp, et al.
Published: (2022)
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
by: Liu, Aiwei, et al.
Published: (2024)
by: Liu, Aiwei, et al.
Published: (2024)
MarkLLM: An Open-Source Toolkit for LLM Watermarking
by: Pan, Leyi, et al.
Published: (2024)
by: Pan, Leyi, et al.
Published: (2024)
Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers
by: Othman, Refat
Published: (2026)
by: Othman, Refat
Published: (2026)
PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models
by: Seddik, Issam, et al.
Published: (2025)
by: Seddik, Issam, et al.
Published: (2025)
Safeguarding Efficacy in Large Language Models: Evaluating Resistance to Human-Written and Algorithmic Adversarial Prompts
by: Downey-Webb, Tiarnaigh, et al.
Published: (2025)
by: Downey-Webb, Tiarnaigh, et al.
Published: (2025)
Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts
by: Peh, Steven
Published: (2025)
by: Peh, Steven
Published: (2025)
Lightweight LLMs for Network Attack Detection in IoT Networks
by: Sudasinghe, Piyumi Bhagya, et al.
Published: (2026)
by: Sudasinghe, Piyumi Bhagya, et al.
Published: (2026)
SecEmb: Sparsity-Aware Secure Federated Learning of On-Device Recommender System with Large Embedding
by: Mai, Peihua, et al.
Published: (2025)
by: Mai, Peihua, et al.
Published: (2025)
Split-and-Denoise: Protect large language model inference with local differential privacy
by: Mai, Peihua, et al.
Published: (2023)
by: Mai, Peihua, et al.
Published: (2023)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
by: Verma, Apurv, et al.
Published: (2024)
by: Verma, Apurv, et al.
Published: (2024)
AI Safeguards, Generative AI and the Pandora Box: AI Safety Measures to Protect Businesses and Personal Reputation
by: Kumar, Prasanna
Published: (2026)
by: Kumar, Prasanna
Published: (2026)
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
by: Datta, Yash, et al.
Published: (2025)
by: Datta, Yash, et al.
Published: (2025)
A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts
by: Young, Richard J., et al.
Published: (2026)
by: Young, Richard J., et al.
Published: (2026)
VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation
by: Miculicich, Lesly, et al.
Published: (2025)
by: Miculicich, Lesly, et al.
Published: (2025)
$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy
by: Dang, Kieu, et al.
Published: (2025)
by: Dang, Kieu, et al.
Published: (2025)
Similar Items
-
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
by: Young, Richard J.
Published: (2025) -
Detecting Prompt Injection Attacks Against Application Using Classifiers
by: Shaheer, Safwan, et al.
Published: (2025) -
Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks
by: Shaheer, Safwan, et al.
Published: (2025) -
Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems
by: Pai, Aaditya
Published: (2026) -
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?
by: Xin, Yuan, et al.
Published: (2025)