:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Collu, Matteo Gioele, Conte, Riccardo, Giaretta, Alberto, Kleyko, Denis, Conti, Mauro, Zavatteri, Matteo, Confalonieri, Roberto
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Cryptography and Security
Online Access:	https://arxiv.org/abs/2605.28553
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Misleading Large Language Models used (or misused) in Scientific Peer-Reviewing via Hidden Prompt-Injection Attacks
by: Collu, Matteo Gioele, et al.
Published: (2025)

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection
by: Hu, Xulin, et al.
Published: (2026)

Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
by: Collu, Matteo Gioele, et al.
Published: (2023)

Differential Area Analysis for Ransomware: Attacks, Countermeasures, and Limitations
by: Venturini, Marco, et al.
Published: (2023)

The Road Less Traveled: Investigating Robustness and Explainability in CNN Malware Detection
by: Brosolo, Matteo, et al.
Published: (2025)

Security and Privacy in Virtual Reality: A Literature Survey
by: Giaretta, Alberto
Published: (2022)

Through the Static: Demystifying Malware Visualization via Explainability
by: Brosolo, Matteo, et al.
Published: (2025)

MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment
by: Halloran, John
Published: (2025)

CANTXSec: A Deterministic Intrusion Detection and Prevention System for CAN Bus Monitoring ECU Activations
by: Donadel, Denis, et al.
Published: (2025)

Exploiting AI for Attacks: On the Interplay between Adversarial AI and Offensive AI
by: Schröer, Saskia Laura, et al.
Published: (2025)

LLMs Can Unlearn Refusal with Only 1,000 Benign Samples
by: Guo, Yangyang, et al.
Published: (2026)

Cybersecurity and Embodiment Integrity for Modern Robots: A Conceptual Framework
by: Giaretta, Alberto, et al.
Published: (2024)

Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
by: Kumar, Priyanshu, et al.
Published: (2024)

Safety is Not Only About Refusal: Reasoning-Enhanced Fine-tuning for Interpretable LLM Safety
by: Zhang, Yuyou, et al.
Published: (2025)

Can LLMs Classify CVEs? Investigating LLMs Capabilities in Computing CVSS Vectors
by: Marchiori, Francesco, et al.
Published: (2025)

HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor
by: Wu, Zihui, et al.
Published: (2025)

Do Reasoning LLMs Refuse What They Infer in Long Contexts?
by: Fu, Yu, et al.
Published: (2026)

CANEDERLI: On The Impact of Adversarial Training and Transferability on CAN Intrusion Detection Systems
by: Marchiori, Francesco, et al.
Published: (2024)

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders
by: Campbell, David, et al.
Published: (2026)

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
by: Zhang, Wentao, et al.
Published: (2026)

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
by: Lan, Wenhao, et al.
Published: (2026)

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack
by: Wu, Tongxi, et al.
Published: (2026)

Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction
by: Xie, Yuanbo, et al.
Published: (2025)

SARSteer: Safeguarding Large Audio-Language Models via Safe-Ablated Refusal Steering
by: Lin, Weilin, et al.
Published: (2025)

Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models
by: Castagnaro, Alberto, et al.
Published: (2024)

Self and Cross-Model Distillation for LLMs: Effective Methods for Refusal Pattern Alignment
by: Li, Jie, et al.
Published: (2024)

Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?
by: Yin, Qingyu, et al.
Published: (2025)

Prompt Injection Evaluations: Refusal Boundary Instability and Artifact-Dependent Compliance in GPT-4-Series Models
by: Heverin, Thomas
Published: (2026)

From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security
by: Basic, Enna, et al.
Published: (2024)

Hyperloop: A Cybersecurity Perspective
by: Brighente, Alessandro, et al.
Published: (2022)

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors
by: Xu, Zhenyu, et al.
Published: (2026)

From Chat Control to Robot Control: Implications of the Chat Control Proposal for Human-Robot Interaction
by: Akalin, Neziha, et al.
Published: (2026)

QUACK! Making the (Rubber) Ducky Talk: A Systematic Study of Keystroke Dynamics for HID Injection Detection
by: Lotto, Alessandro, et al.
Published: (2026)

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
by: Sheng, Leheng, et al.
Published: (2025)

E-Trojans: Ransomware, Tracking, DoS, and Data Leaks on Battery-powered Embedded Systems
by: Casagrande, Marco, et al.
Published: (2024)

Exploiting Kubernetes' Image Pull Implementation to Deny Node Availability
by: Knob, Luis Augusto Dias, et al.
Published: (2024)

Leaky Batteries: A Novel Set of Side-Channel Attacks on Electric Vehicles
by: Marchiori, Francesco, et al.
Published: (2025)

From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
by: Chae, Kyubyung, et al.
Published: (2025)

A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models
by: Linder, Noa, et al.
Published: (2026)

Security through the Eyes of AI: How Visualization is Shaping Malware Detection
by: Brosolo, Matteo, et al.
Published: (2025)