:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Spracklen, Joseph, Aghazadeh, Pedram, Koushanfar, Farinaz, Jadliwala, Murtuza
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2605.01047
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
by: Spracklen, Joseph, et al.
Published: (2024)

Optimizing Privacy-Preserving Primitives to Support LLM-Scale Applications
by: Jandali, Yaman, et al.
Published: (2025)

Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign
by: Zhang, Ruisi, et al.
Published: (2025)

Prompt and Circumstances: Evaluating the Efficacy of Human Prompt Inference in AI-Generated Art
by: Trinh, Khoi, et al.
Published: (2026)

Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM
by: Wu, Xiaoyu, et al.
Published: (2025)

VocalBridge: Latent Diffusion-Bridge Purification for Defeating Perturbation-Based Voiceprint Defenses
by: Abbasihafshejani, Maryam, et al.
Published: (2026)

From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning
by: Xu, Xiaoyu, et al.
Published: (2026)

Props for Machine-Learning Security
by: Juels, Ari, et al.
Published: (2024)

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2026)

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2025)

Textual Unlearning Gives a False Sense of Unlearning
by: Du, Jiacheng, et al.
Published: (2024)

EmMark: Robust Watermarks for IP Protection of Embedded Quantized Large Language Models
by: Zhang, Ruisi, et al.
Published: (2024)

Watermarking Large Language Models and the Generated Content: Opportunities and Challenges
by: Zhang, Ruisi, et al.
Published: (2024)

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
by: Xu, Xiaoyu, et al.
Published: (2025)

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI
by: Shumailov, Ilia, et al.
Published: (2024)

Beyond Perplexity: A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2026)

Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models
by: Huo, Mingjia, et al.
Published: (2024)

Forgetting-MarI: LLM Unlearning via Marginal Information Regularization
by: Xu, Shizhou, et al.
Published: (2025)

SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
by: Muhamed, Aashiq, et al.
Published: (2025)

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)

An Adversarial Perspective on Machine Unlearning for AI Safety
by: Łucki, Jakub, et al.
Published: (2024)

Machine Unlearning of Pre-trained Large Language Models
by: Yao, Jin, et al.
Published: (2024)

Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)

FIT to Forget: Robust Continual Unlearning for Large Language Models
by: Xu, Xiaoyu, et al.
Published: (2026)

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
by: Xu, Xiaoyu, et al.
Published: (2025)

Using Hallucinations to Bypass GPT4's Filter
by: Lemkin, Benjamin
Published: (2024)

Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting
by: To, Bang Trinh Tran, et al.
Published: (2025)

REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
by: Zhang, Ruisi, et al.
Published: (2023)

MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2025)

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)

AttestLLM: Efficient Attestation Framework for Billion-scale On-device LLMs
by: Zhang, Ruisi, et al.
Published: (2025)

Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
by: Jang, Yeonwoo, et al.
Published: (2025)

In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
by: Shetty, Anudeex, et al.
Published: (2026)

SWaRL: Safeguard Code Watermarking via Reinforcement Learning
by: Javidnia, Neusha, et al.
Published: (2026)

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
by: Chen, Sixu, et al.
Published: (2026)

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
by: Tang, Haochun, et al.
Published: (2026)

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning
by: Ahmed, Nesreen K., et al.
Published: (2026)

Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking
by: Xu, Yijie, et al.
Published: (2025)

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization
by: Frikha, Ahmed, et al.
Published: (2024)

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding
by: Nakka, Krishna Kanth, et al.
Published: (2024)