:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	de Gregorio, Alfonso
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Computation and Language
Online Access:	https://arxiv.org/abs/2505.17109
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mitigating Jailbreaks with Intent-Aware LLMs
by: Yeo, Wei Jie, et al.
Published: (2025)

SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems
by: Bodea, Andreea-Elena, et al.
Published: (2026)

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
by: Wang, Huandong, et al.
Published: (2025)

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
by: Mathew, Yohan, et al.
Published: (2024)

The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
by: Xiong, Alexander, et al.
Published: (2025)

The Ethics of Interaction: Mitigating Security Threats in LLMs
by: Kumar, Ashutosh, et al.
Published: (2024)

Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs
by: Wang, Jiawen, et al.
Published: (2025)

Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective
by: Tshimula, Jean Marie, et al.
Published: (2024)

Building Resilient SMEs: Harnessing Large Language Models for Cyber Security in Australia
by: Kereopa-Yorke, Benjamin
Published: (2023)

The Incoherency Risk in the EU's New Cyber Security Policies
by: Ruohonen, Jukka
Published: (2024)

LLM Cyber Evaluations Don't Capture Real-World Risk
by: Lukošiūtė, Kamilė, et al.
Published: (2025)

From Retrieval to Reasoning: A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions
by: Peng, Jiaren, et al.
Published: (2025)

CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?
by: Chen, Xiangsen, et al.
Published: (2026)

Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
by: Joshi, Kunj, et al.
Published: (2025)

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
by: Luo, Zhifan, et al.
Published: (2025)

PolicyLR: A Logic Representation For Privacy Policies
by: Hooda, Ashish, et al.
Published: (2024)

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors
by: Cao, Bochuan, et al.
Published: (2025)

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification
by: Tang, Fengxiao, et al.
Published: (2025)

Cyber-Zero: Training Cybersecurity Agents without Runtime
by: Zhuo, Terry Yue, et al.
Published: (2025)

The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration
by: Patil, Vaidehi, et al.
Published: (2025)

Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness
by: Shafee, Samaneh, et al.
Published: (2024)

GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
by: Huang, Ruixuan, et al.
Published: (2025)

MBTSAD: Mitigating Backdoors in Language Models Based on Token Splitting and Attention Distillation
by: Ding, Yidong, et al.
Published: (2025)

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
by: Zhang, Junbo, et al.
Published: (2025)

Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
by: Wang, Jiongxiao, et al.
Published: (2024)

Good Parenting is all you need -- Multi-agentic LLM Hallucination Mitigation
by: Kwartler, Ted, et al.
Published: (2024)

Compiling Activation Steering into Weights via Null-Space Constraints for Stealthy Backdoors
by: Yin, Rui, et al.
Published: (2026)

Fingerprinting LLMs via Prompt Injection
by: Hu, Yuepeng, et al.
Published: (2025)

LLMs for Domain Generation Algorithm Detection
by: La O, Reynier Leyva, et al.
Published: (2024)

Overriding Safety protections of Open-source Models
by: Kumar, Sachin
Published: (2024)

SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks
by: He, Xuanli, et al.
Published: (2024)

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness
by: Huang, Baizhou, et al.
Published: (2024)

Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments
by: Rigaki, Maria, et al.
Published: (2023)

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
by: Struppek, Lukas, et al.
Published: (2026)

SoK: Are Watermarks in LLMs Ready for Deployment?
by: Dang, Kieu, et al.
Published: (2025)

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
by: Hughes, Anthony, et al.
Published: (2025)

Do Prompts Guarantee Safety? Mitigating Toxicity from LLM Generations through Subspace Intervention
by: Singh, Himanshu, et al.
Published: (2026)

Towards Principled Analysis and Mitigation of Space Cyber Risks
by: Ear, Ekzhin
Published: (2025)

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation
by: Wu, Yiran, et al.
Published: (2025)

Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models
by: Fieblinger, Romy, et al.
Published: (2024)