Saved in:
| Main Authors: | Belkhiter, Yannis, Zizzo, Giulio, Maffeis, Sergio, Tirupathi, Seshu, Kelleher, John D. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.20994 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment
by: Belkhiter, Yannis, et al.
Published: (2024)
by: Belkhiter, Yannis, et al.
Published: (2024)
Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring
by: Belkhiter, Yannis, et al.
Published: (2025)
by: Belkhiter, Yannis, et al.
Published: (2025)
Blue Teaming Function-Calling Agents
by: Dolcetti, Greta, et al.
Published: (2026)
by: Dolcetti, Greta, et al.
Published: (2026)
TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping
by: Belkhiter, Yannis, et al.
Published: (2026)
by: Belkhiter, Yannis, et al.
Published: (2026)
Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets
by: Belkhiter, Yannis, et al.
Published: (2025)
by: Belkhiter, Yannis, et al.
Published: (2025)
Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference
by: Belkhiter, Yannis, et al.
Published: (2025)
by: Belkhiter, Yannis, et al.
Published: (2025)
Elevating Defenses: Bridging Adversarial Training and Watermarking for Model Resilience
by: Thakkar, Janvi, et al.
Published: (2023)
by: Thakkar, Janvi, et al.
Published: (2023)
Towards a Practical Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via Randomized Smoothing
by: Gibert, Daniel, et al.
Published: (2023)
by: Gibert, Daniel, et al.
Published: (2023)
A Robust Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via (De)Randomized Smoothing
by: Gibert, Daniel, et al.
Published: (2024)
by: Gibert, Daniel, et al.
Published: (2024)
A Survey on Agentic Security: Applications, Threats and Defenses
by: Shahriar, Asif, et al.
Published: (2025)
by: Shahriar, Asif, et al.
Published: (2025)
In-Context Representation Hijacking
by: Yona, Itay, et al.
Published: (2025)
by: Yona, Itay, et al.
Published: (2025)
Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models
by: Yi, Sibo, et al.
Published: (2025)
by: Yi, Sibo, et al.
Published: (2025)
PARASITE: Conditional System Prompt Poisoning to Hijack LLMs
by: Pham, Viet, et al.
Published: (2025)
by: Pham, Viet, et al.
Published: (2025)
Make Split, not Hijack: Preventing Feature-Space Hijacking Attacks in Split Learning
by: Khan, Tanveer, et al.
Published: (2024)
by: Khan, Tanveer, et al.
Published: (2024)
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models
by: Zhang, Yucheng, et al.
Published: (2024)
by: Zhang, Yucheng, et al.
Published: (2024)
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
by: Chae, Kyubyung, et al.
Published: (2025)
by: Chae, Kyubyung, et al.
Published: (2025)
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
by: Cornacchia, Giandomenico, et al.
Published: (2024)
by: Cornacchia, Giandomenico, et al.
Published: (2024)
Assessing the Impact of Packing on Machine Learning-Based Malware Detection and Classification Systems
by: Gibert, Daniel, et al.
Published: (2024)
by: Gibert, Daniel, et al.
Published: (2024)
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
by: Hou, Xinyi, et al.
Published: (2025)
by: Hou, Xinyi, et al.
Published: (2025)
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
by: Huang, Yao, et al.
Published: (2025)
by: Huang, Yao, et al.
Published: (2025)
Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs
by: Momcilovic, Tomas Bueno, et al.
Published: (2024)
by: Momcilovic, Tomas Bueno, et al.
Published: (2024)
PentestMCP: A Toolkit for Agentic Penetration Testing
by: Ezetta, Zachary, et al.
Published: (2025)
by: Ezetta, Zachary, et al.
Published: (2025)
Breaking PEFT Limitations: Leveraging Weak-to-Strong Knowledge Transfer for Backdoor Attacks in LLMs
by: Zhao, Shuai, et al.
Published: (2024)
by: Zhao, Shuai, et al.
Published: (2024)
JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
by: Luo, Weidi, et al.
Published: (2024)
by: Luo, Weidi, et al.
Published: (2024)
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders
by: Noever, David, et al.
Published: (2024)
by: Noever, David, et al.
Published: (2024)
Resource Consumption Threats in Large Language Models
by: Zhang, Yuanhe, et al.
Published: (2026)
by: Zhang, Yuanhe, et al.
Published: (2026)
Certified Adversarial Robustness of Machine Learning-based Malware Detectors via (De)Randomized Smoothing
by: Gibert, Daniel, et al.
Published: (2024)
by: Gibert, Daniel, et al.
Published: (2024)
Prompt-in-Content Attacks: Exploiting Uploaded Inputs to Hijack LLM Behavior
by: Lian, Zhuotao, et al.
Published: (2025)
by: Lian, Zhuotao, et al.
Published: (2025)
Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction
by: Wang, Hongtao, et al.
Published: (2026)
by: Wang, Hongtao, et al.
Published: (2026)
MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)
by: Shen, Yi Ting, et al.
Published: (2026)
by: Shen, Yi Ting, et al.
Published: (2026)
FNF: Functional Network Fingerprint for Large Language Models
by: Liu, Yiheng, et al.
Published: (2026)
by: Liu, Yiheng, et al.
Published: (2026)
ThreatGPT: An Agentic AI Framework for Enhancing Public Safety through Threat Modeling
by: Zisad, Sharif Noor, et al.
Published: (2025)
by: Zisad, Sharif Noor, et al.
Published: (2025)
May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks
by: Pandya, Nishit V., et al.
Published: (2025)
by: Pandya, Nishit V., et al.
Published: (2025)
Investigation of Advanced Persistent Threats Network-based Tactics, Techniques and Procedures
by: Alageel, Almuthanna, et al.
Published: (2025)
by: Alageel, Almuthanna, et al.
Published: (2025)
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms
by: Acharya, Nirajan, et al.
Published: (2026)
by: Acharya, Nirajan, et al.
Published: (2026)
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)
by: Cao, Bochuan, et al.
Published: (2023)
Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking
by: You, Ziyang, et al.
Published: (2026)
by: You, Ziyang, et al.
Published: (2026)
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models
by: Fieblinger, Romy, et al.
Published: (2024)
by: Fieblinger, Romy, et al.
Published: (2024)
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
by: Wu, Zihui, et al.
Published: (2024)
by: Wu, Zihui, et al.
Published: (2024)
The Ethics of Interaction: Mitigating Security Threats in LLMs
by: Kumar, Ashutosh, et al.
Published: (2024)
by: Kumar, Ashutosh, et al.
Published: (2024)
Similar Items
-
HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment
by: Belkhiter, Yannis, et al.
Published: (2024) -
Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring
by: Belkhiter, Yannis, et al.
Published: (2025) -
Blue Teaming Function-Calling Agents
by: Dolcetti, Greta, et al.
Published: (2026) -
TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping
by: Belkhiter, Yannis, et al.
Published: (2026) -
Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets
by: Belkhiter, Yannis, et al.
Published: (2025)