:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Belkhiter, Yannis, Zizzo, Giulio, Maffeis, Sergio, Tirupathi, Seshu, Kelleher, John D.
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2604.20994
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment
by: Belkhiter, Yannis, et al.
Published: (2024)

Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring
by: Belkhiter, Yannis, et al.
Published: (2025)

Blue Teaming Function-Calling Agents
by: Dolcetti, Greta, et al.
Published: (2026)

TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping
by: Belkhiter, Yannis, et al.
Published: (2026)

Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets
by: Belkhiter, Yannis, et al.
Published: (2025)

Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference
by: Belkhiter, Yannis, et al.
Published: (2025)

Elevating Defenses: Bridging Adversarial Training and Watermarking for Model Resilience
by: Thakkar, Janvi, et al.
Published: (2023)

Towards a Practical Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via Randomized Smoothing
by: Gibert, Daniel, et al.
Published: (2023)

A Robust Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via (De)Randomized Smoothing
by: Gibert, Daniel, et al.
Published: (2024)

A Survey on Agentic Security: Applications, Threats and Defenses
by: Shahriar, Asif, et al.
Published: (2025)

In-Context Representation Hijacking
by: Yona, Itay, et al.
Published: (2025)

Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models
by: Yi, Sibo, et al.
Published: (2025)

PARASITE: Conditional System Prompt Poisoning to Hijack LLMs
by: Pham, Viet, et al.
Published: (2025)

Make Split, not Hijack: Preventing Feature-Space Hijacking Attacks in Split Learning
by: Khan, Tanveer, et al.
Published: (2024)

HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models
by: Zhang, Yucheng, et al.
Published: (2024)

From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
by: Chae, Kyubyung, et al.
Published: (2025)

MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
by: Cornacchia, Giandomenico, et al.
Published: (2024)

Assessing the Impact of Packing on Machine Learning-Based Malware Detection and Classification Systems
by: Gibert, Daniel, et al.
Published: (2024)

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
by: Hou, Xinyi, et al.
Published: (2025)

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
by: Huang, Yao, et al.
Published: (2025)

Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs
by: Momcilovic, Tomas Bueno, et al.
Published: (2024)

PentestMCP: A Toolkit for Agentic Penetration Testing
by: Ezetta, Zachary, et al.
Published: (2025)

Breaking PEFT Limitations: Leveraging Weak-to-Strong Knowledge Transfer for Backdoor Attacks in LLMs
by: Zhao, Shuai, et al.
Published: (2024)

JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
by: Luo, Weidi, et al.
Published: (2024)

Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders
by: Noever, David, et al.
Published: (2024)

Resource Consumption Threats in Large Language Models
by: Zhang, Yuanhe, et al.
Published: (2026)

Certified Adversarial Robustness of Machine Learning-based Malware Detectors via (De)Randomized Smoothing
by: Gibert, Daniel, et al.
Published: (2024)

Prompt-in-Content Attacks: Exploiting Uploaded Inputs to Hijack LLM Behavior
by: Lian, Zhuotao, et al.
Published: (2025)

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction
by: Wang, Hongtao, et al.
Published: (2026)

MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)
by: Shen, Yi Ting, et al.
Published: (2026)

FNF: Functional Network Fingerprint for Large Language Models
by: Liu, Yiheng, et al.
Published: (2026)

ThreatGPT: An Agentic AI Framework for Enhancing Public Safety through Threat Modeling
by: Zisad, Sharif Noor, et al.
Published: (2025)

May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks
by: Pandya, Nishit V., et al.
Published: (2025)

Investigation of Advanced Persistent Threats Network-based Tactics, Techniques and Procedures
by: Alageel, Almuthanna, et al.
Published: (2025)

A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms
by: Acharya, Nirajan, et al.
Published: (2026)

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)

Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking
by: You, Ziyang, et al.
Published: (2026)

Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models
by: Fieblinger, Romy, et al.
Published: (2024)

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
by: Wu, Zihui, et al.
Published: (2024)

The Ethics of Interaction: Mitigating Security Threats in LLMs
by: Kumar, Ashutosh, et al.
Published: (2024)