Saved in:
| Main Authors: | Collu, Matteo Gioele, Conte, Riccardo, Giaretta, Alberto, Kleyko, Denis, Conti, Mauro, Zavatteri, Matteo, Confalonieri, Roberto |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.28553 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Misleading Large Language Models used (or misused) in Scientific Peer-Reviewing via Hidden Prompt-Injection Attacks
by: Collu, Matteo Gioele, et al.
Published: (2025)
by: Collu, Matteo Gioele, et al.
Published: (2025)
Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection
by: Hu, Xulin, et al.
Published: (2026)
by: Hu, Xulin, et al.
Published: (2026)
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
by: Collu, Matteo Gioele, et al.
Published: (2023)
by: Collu, Matteo Gioele, et al.
Published: (2023)
Differential Area Analysis for Ransomware: Attacks, Countermeasures, and Limitations
by: Venturini, Marco, et al.
Published: (2023)
by: Venturini, Marco, et al.
Published: (2023)
The Road Less Traveled: Investigating Robustness and Explainability in CNN Malware Detection
by: Brosolo, Matteo, et al.
Published: (2025)
by: Brosolo, Matteo, et al.
Published: (2025)
Security and Privacy in Virtual Reality: A Literature Survey
by: Giaretta, Alberto
Published: (2022)
by: Giaretta, Alberto
Published: (2022)
Through the Static: Demystifying Malware Visualization via Explainability
by: Brosolo, Matteo, et al.
Published: (2025)
by: Brosolo, Matteo, et al.
Published: (2025)
MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment
by: Halloran, John
Published: (2025)
by: Halloran, John
Published: (2025)
CANTXSec: A Deterministic Intrusion Detection and Prevention System for CAN Bus Monitoring ECU Activations
by: Donadel, Denis, et al.
Published: (2025)
by: Donadel, Denis, et al.
Published: (2025)
Exploiting AI for Attacks: On the Interplay between Adversarial AI and Offensive AI
by: Schröer, Saskia Laura, et al.
Published: (2025)
by: Schröer, Saskia Laura, et al.
Published: (2025)
LLMs Can Unlearn Refusal with Only 1,000 Benign Samples
by: Guo, Yangyang, et al.
Published: (2026)
by: Guo, Yangyang, et al.
Published: (2026)
Cybersecurity and Embodiment Integrity for Modern Robots: A Conceptual Framework
by: Giaretta, Alberto, et al.
Published: (2024)
by: Giaretta, Alberto, et al.
Published: (2024)
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
by: Kumar, Priyanshu, et al.
Published: (2024)
by: Kumar, Priyanshu, et al.
Published: (2024)
Safety is Not Only About Refusal: Reasoning-Enhanced Fine-tuning for Interpretable LLM Safety
by: Zhang, Yuyou, et al.
Published: (2025)
by: Zhang, Yuyou, et al.
Published: (2025)
Can LLMs Classify CVEs? Investigating LLMs Capabilities in Computing CVSS Vectors
by: Marchiori, Francesco, et al.
Published: (2025)
by: Marchiori, Francesco, et al.
Published: (2025)
HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor
by: Wu, Zihui, et al.
Published: (2025)
by: Wu, Zihui, et al.
Published: (2025)
Do Reasoning LLMs Refuse What They Infer in Long Contexts?
by: Fu, Yu, et al.
Published: (2026)
by: Fu, Yu, et al.
Published: (2026)
CANEDERLI: On The Impact of Adversarial Training and Transferability on CAN Intrusion Detection Systems
by: Marchiori, Francesco, et al.
Published: (2024)
by: Marchiori, Francesco, et al.
Published: (2024)
Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders
by: Campbell, David, et al.
Published: (2026)
by: Campbell, David, et al.
Published: (2026)
Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
by: Zhang, Wentao, et al.
Published: (2026)
by: Zhang, Wentao, et al.
Published: (2026)
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
by: Lan, Wenhao, et al.
Published: (2026)
by: Lan, Wenhao, et al.
Published: (2026)
Furina: Fragmented Uncertainty-Driven Refusal Instability Attack
by: Wu, Tongxi, et al.
Published: (2026)
by: Wu, Tongxi, et al.
Published: (2026)
Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction
by: Xie, Yuanbo, et al.
Published: (2025)
by: Xie, Yuanbo, et al.
Published: (2025)
SARSteer: Safeguarding Large Audio-Language Models via Safe-Ablated Refusal Steering
by: Lin, Weilin, et al.
Published: (2025)
by: Lin, Weilin, et al.
Published: (2025)
Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models
by: Castagnaro, Alberto, et al.
Published: (2024)
by: Castagnaro, Alberto, et al.
Published: (2024)
Self and Cross-Model Distillation for LLMs: Effective Methods for Refusal Pattern Alignment
by: Li, Jie, et al.
Published: (2024)
by: Li, Jie, et al.
Published: (2024)
Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?
by: Yin, Qingyu, et al.
Published: (2025)
by: Yin, Qingyu, et al.
Published: (2025)
Prompt Injection Evaluations: Refusal Boundary Instability and Artifact-Dependent Compliance in GPT-4-Series Models
by: Heverin, Thomas
Published: (2026)
by: Heverin, Thomas
Published: (2026)
From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security
by: Basic, Enna, et al.
Published: (2024)
by: Basic, Enna, et al.
Published: (2024)
Hyperloop: A Cybersecurity Perspective
by: Brighente, Alessandro, et al.
Published: (2022)
by: Brighente, Alessandro, et al.
Published: (2022)
A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors
by: Xu, Zhenyu, et al.
Published: (2026)
by: Xu, Zhenyu, et al.
Published: (2026)
From Chat Control to Robot Control: Implications of the Chat Control Proposal for Human-Robot Interaction
by: Akalin, Neziha, et al.
Published: (2026)
by: Akalin, Neziha, et al.
Published: (2026)
QUACK! Making the (Rubber) Ducky Talk: A Systematic Study of Keystroke Dynamics for HID Injection Detection
by: Lotto, Alessandro, et al.
Published: (2026)
by: Lotto, Alessandro, et al.
Published: (2026)
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
by: Sheng, Leheng, et al.
Published: (2025)
by: Sheng, Leheng, et al.
Published: (2025)
E-Trojans: Ransomware, Tracking, DoS, and Data Leaks on Battery-powered Embedded Systems
by: Casagrande, Marco, et al.
Published: (2024)
by: Casagrande, Marco, et al.
Published: (2024)
Exploiting Kubernetes' Image Pull Implementation to Deny Node Availability
by: Knob, Luis Augusto Dias, et al.
Published: (2024)
by: Knob, Luis Augusto Dias, et al.
Published: (2024)
Leaky Batteries: A Novel Set of Side-Channel Attacks on Electric Vehicles
by: Marchiori, Francesco, et al.
Published: (2025)
by: Marchiori, Francesco, et al.
Published: (2025)
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
by: Chae, Kyubyung, et al.
Published: (2025)
by: Chae, Kyubyung, et al.
Published: (2025)
A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models
by: Linder, Noa, et al.
Published: (2026)
by: Linder, Noa, et al.
Published: (2026)
Security through the Eyes of AI: How Visualization is Shaping Malware Detection
by: Brosolo, Matteo, et al.
Published: (2025)
by: Brosolo, Matteo, et al.
Published: (2025)
Similar Items
-
Misleading Large Language Models used (or misused) in Scientific Peer-Reviewing via Hidden Prompt-Injection Attacks
by: Collu, Matteo Gioele, et al.
Published: (2025) -
Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection
by: Hu, Xulin, et al.
Published: (2026) -
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
by: Collu, Matteo Gioele, et al.
Published: (2023) -
Differential Area Analysis for Ransomware: Attacks, Countermeasures, and Limitations
by: Venturini, Marco, et al.
Published: (2023) -
The Road Less Traveled: Investigating Robustness and Explainability in CNN Malware Detection
by: Brosolo, Matteo, et al.
Published: (2025)