Saved in:
| Main Authors: | Spracklen, Joseph, Aghazadeh, Pedram, Koushanfar, Farinaz, Jadliwala, Murtuza |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.01047 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
by: Spracklen, Joseph, et al.
Published: (2024)
by: Spracklen, Joseph, et al.
Published: (2024)
Optimizing Privacy-Preserving Primitives to Support LLM-Scale Applications
by: Jandali, Yaman, et al.
Published: (2025)
by: Jandali, Yaman, et al.
Published: (2025)
Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign
by: Zhang, Ruisi, et al.
Published: (2025)
by: Zhang, Ruisi, et al.
Published: (2025)
Prompt and Circumstances: Evaluating the Efficacy of Human Prompt Inference in AI-Generated Art
by: Trinh, Khoi, et al.
Published: (2026)
by: Trinh, Khoi, et al.
Published: (2026)
Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM
by: Wu, Xiaoyu, et al.
Published: (2025)
by: Wu, Xiaoyu, et al.
Published: (2025)
VocalBridge: Latent Diffusion-Bridge Purification for Defeating Perturbation-Based Voiceprint Defenses
by: Abbasihafshejani, Maryam, et al.
Published: (2026)
by: Abbasihafshejani, Maryam, et al.
Published: (2026)
From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning
by: Xu, Xiaoyu, et al.
Published: (2026)
by: Xu, Xiaoyu, et al.
Published: (2026)
Props for Machine-Learning Security
by: Juels, Ari, et al.
Published: (2024)
by: Juels, Ari, et al.
Published: (2024)
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2026)
by: Liang, Buyun, et al.
Published: (2026)
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2025)
by: Liang, Buyun, et al.
Published: (2025)
Textual Unlearning Gives a False Sense of Unlearning
by: Du, Jiacheng, et al.
Published: (2024)
by: Du, Jiacheng, et al.
Published: (2024)
EmMark: Robust Watermarks for IP Protection of Embedded Quantized Large Language Models
by: Zhang, Ruisi, et al.
Published: (2024)
by: Zhang, Ruisi, et al.
Published: (2024)
Watermarking Large Language Models and the Generated Content: Opportunities and Challenges
by: Zhang, Ruisi, et al.
Published: (2024)
by: Zhang, Ruisi, et al.
Published: (2024)
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
by: Xu, Xiaoyu, et al.
Published: (2025)
by: Xu, Xiaoyu, et al.
Published: (2025)
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI
by: Shumailov, Ilia, et al.
Published: (2024)
by: Shumailov, Ilia, et al.
Published: (2024)
Beyond Perplexity: A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2026)
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2026)
Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models
by: Huo, Mingjia, et al.
Published: (2024)
by: Huo, Mingjia, et al.
Published: (2024)
Forgetting-MarI: LLM Unlearning via Marginal Information Regularization
by: Xu, Shizhou, et al.
Published: (2025)
by: Xu, Shizhou, et al.
Published: (2025)
SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
by: Muhamed, Aashiq, et al.
Published: (2025)
by: Muhamed, Aashiq, et al.
Published: (2025)
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)
by: Yuan, Hongbang, et al.
Published: (2024)
An Adversarial Perspective on Machine Unlearning for AI Safety
by: Łucki, Jakub, et al.
Published: (2024)
by: Łucki, Jakub, et al.
Published: (2024)
Machine Unlearning of Pre-trained Large Language Models
by: Yao, Jin, et al.
Published: (2024)
by: Yao, Jin, et al.
Published: (2024)
Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)
by: Zymet, Jesse, et al.
Published: (2026)
FIT to Forget: Robust Continual Unlearning for Large Language Models
by: Xu, Xiaoyu, et al.
Published: (2026)
by: Xu, Xiaoyu, et al.
Published: (2026)
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
by: Xu, Xiaoyu, et al.
Published: (2025)
by: Xu, Xiaoyu, et al.
Published: (2025)
Using Hallucinations to Bypass GPT4's Filter
by: Lemkin, Benjamin
Published: (2024)
by: Lemkin, Benjamin
Published: (2024)
Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting
by: To, Bang Trinh Tran, et al.
Published: (2025)
by: To, Bang Trinh Tran, et al.
Published: (2025)
REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
by: Zhang, Ruisi, et al.
Published: (2023)
by: Zhang, Ruisi, et al.
Published: (2023)
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2025)
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2025)
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)
by: Cao, Bochuan, et al.
Published: (2023)
AttestLLM: Efficient Attestation Framework for Billion-scale On-device LLMs
by: Zhang, Ruisi, et al.
Published: (2025)
by: Zhang, Ruisi, et al.
Published: (2025)
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
by: Jang, Yeonwoo, et al.
Published: (2025)
by: Jang, Yeonwoo, et al.
Published: (2025)
In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
by: Shetty, Anudeex, et al.
Published: (2026)
by: Shetty, Anudeex, et al.
Published: (2026)
SWaRL: Safeguard Code Watermarking via Reinforcement Learning
by: Javidnia, Neusha, et al.
Published: (2026)
by: Javidnia, Neusha, et al.
Published: (2026)
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
by: Chen, Sixu, et al.
Published: (2026)
by: Chen, Sixu, et al.
Published: (2026)
Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
by: Tang, Haochun, et al.
Published: (2026)
by: Tang, Haochun, et al.
Published: (2026)
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning
by: Ahmed, Nesreen K., et al.
Published: (2026)
by: Ahmed, Nesreen K., et al.
Published: (2026)
Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking
by: Xu, Yijie, et al.
Published: (2025)
by: Xu, Yijie, et al.
Published: (2025)
IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization
by: Frikha, Ahmed, et al.
Published: (2024)
by: Frikha, Ahmed, et al.
Published: (2024)
PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding
by: Nakka, Krishna Kanth, et al.
Published: (2024)
by: Nakka, Krishna Kanth, et al.
Published: (2024)
Similar Items
-
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
by: Spracklen, Joseph, et al.
Published: (2024) -
Optimizing Privacy-Preserving Primitives to Support LLM-Scale Applications
by: Jandali, Yaman, et al.
Published: (2025) -
Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign
by: Zhang, Ruisi, et al.
Published: (2025) -
Prompt and Circumstances: Evaluating the Efficacy of Human Prompt Inference in AI-Generated Art
by: Trinh, Khoi, et al.
Published: (2026) -
Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM
by: Wu, Xiaoyu, et al.
Published: (2025)