Saved in:
| Main Authors: | Li, Taoran, Chandrasekaran, Varun, Yu, Zhiyuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22562 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Erasure Illusion: Stress-Testing the Generalization of LLM Forgetting Evaluation
by: Jia, Hengrui, et al.
Published: (2025)
by: Jia, Hengrui, et al.
Published: (2025)
SoK: Understanding (New) Security Issues Across AI4Code Use Cases
by: Wu, Qilong, et al.
Published: (2025)
by: Wu, Qilong, et al.
Published: (2025)
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
by: Wang, Xiangwen, et al.
Published: (2026)
by: Wang, Xiangwen, et al.
Published: (2026)
The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis
by: Wu, Qilong, et al.
Published: (2024)
by: Wu, Qilong, et al.
Published: (2024)
Privately Aligning Language Models with Reinforcement Learning
by: Wu, Fan, et al.
Published: (2023)
by: Wu, Fan, et al.
Published: (2023)
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models
by: Zhang, Jingxuan, et al.
Published: (2025)
by: Zhang, Jingxuan, et al.
Published: (2025)
AMUN: Adversarial Machine UNlearning
by: Ebrahimpour-Boroojeny, Ali, et al.
Published: (2025)
by: Ebrahimpour-Boroojeny, Ali, et al.
Published: (2025)
Zk-SNARK for String Match
by: Li, Taoran, et al.
Published: (2025)
by: Li, Taoran, et al.
Published: (2025)
Bypassing LLM Watermarks with Color-Aware Substitutions
by: Wu, Qilong, et al.
Published: (2024)
by: Wu, Qilong, et al.
Published: (2024)
Challenges in Enabling Private Data Valuation
by: Fu, Yiwei, et al.
Published: (2026)
by: Fu, Yiwei, et al.
Published: (2026)
A Survey On Secure Machine Learning
by: Liao, Taobo, et al.
Published: (2025)
by: Liao, Taobo, et al.
Published: (2025)
Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection
by: Li, Shuai, et al.
Published: (2023)
by: Li, Shuai, et al.
Published: (2023)
ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
by: Zhao, Yunhan, et al.
Published: (2026)
by: Zhao, Yunhan, et al.
Published: (2026)
Empirical Evaluation of Memory-Erasure Protocols
by: Gil-Pons, Reynaldo, et al.
Published: (2025)
by: Gil-Pons, Reynaldo, et al.
Published: (2025)
Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models
by: Xu, Zhiyuan, et al.
Published: (2025)
by: Xu, Zhiyuan, et al.
Published: (2025)
PREE: Towards Harmless and Adaptive Fingerprint Editing in Large Language Models via Knowledge Prefix Enhancement
by: Yue, Xubin, et al.
Published: (2025)
by: Yue, Xubin, et al.
Published: (2025)
AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models
by: Li, Fengpeng, et al.
Published: (2026)
by: Li, Fengpeng, et al.
Published: (2026)
Rethinking Robust Adversarial Concept Erasure in Diffusion Models
by: Yin, Qinghong, et al.
Published: (2025)
by: Yin, Qinghong, et al.
Published: (2025)
R-CoT: A Reasoning-Layer Watermark via Redundant Chain-of-Thought in Large Language Models
by: Zhang, Ziming, et al.
Published: (2026)
by: Zhang, Ziming, et al.
Published: (2026)
KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models
by: Fang, Ce, et al.
Published: (2026)
by: Fang, Ce, et al.
Published: (2026)
Transferring Backdoors between Large Language Models by Knowledge Distillation
by: Cheng, Pengzhou, et al.
Published: (2024)
by: Cheng, Pengzhou, et al.
Published: (2024)
Safety Layers in Aligned Large Language Models: The Key to LLM Security
by: Li, Shen, et al.
Published: (2024)
by: Li, Shen, et al.
Published: (2024)
A Survey: Towards Privacy and Security in Mobile Large Language Models
by: Xu, Honghui, et al.
Published: (2025)
by: Xu, Honghui, et al.
Published: (2025)
An Automated Attack Investigation Approach Leveraging Threat-Knowledge-Augmented Large Language Models
by: Dai, Rujie, et al.
Published: (2025)
by: Dai, Rujie, et al.
Published: (2025)
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
by: Yu, Zhiyuan, et al.
Published: (2024)
by: Yu, Zhiyuan, et al.
Published: (2024)
zkLLM: Zero Knowledge Proofs for Large Language Models
by: Sun, Haochen, et al.
Published: (2024)
by: Sun, Haochen, et al.
Published: (2024)
LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings
by: Li, Ting, et al.
Published: (2025)
by: Li, Ting, et al.
Published: (2025)
Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models
by: Li, Songze, et al.
Published: (2026)
by: Li, Songze, et al.
Published: (2026)
Software-Based Memory Erasure with relaxed isolation requirements: Extended Version
by: Bursuc, Sergiu, et al.
Published: (2024)
by: Bursuc, Sergiu, et al.
Published: (2024)
AEGIS: No Tool Call Left Unchecked -- A Pre-Execution Firewall and Audit Layer for AI Agents
by: Yuan, Aojie, et al.
Published: (2026)
by: Yuan, Aojie, et al.
Published: (2026)
DP-FedLoRA: Privacy-Enhanced Federated Fine-Tuning for On-Device Large Language Models
by: Xu, Honghui, et al.
Published: (2025)
by: Xu, Honghui, et al.
Published: (2025)
Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models
by: Xiong, Chen, et al.
Published: (2026)
by: Xiong, Chen, et al.
Published: (2026)
CoreUnlearn: Rethinking Concept Unlearning through Disentangled Component-Level Erasure in Text-guided Diffusion Models
by: Zhao, Mengnan, et al.
Published: (2026)
by: Zhao, Mengnan, et al.
Published: (2026)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2024)
by: Tu, Shangqing, et al.
Published: (2024)
Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning
by: Song, Baogang, et al.
Published: (2025)
by: Song, Baogang, et al.
Published: (2025)
BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models
by: Rachapudi, Jagadeesh, et al.
Published: (2026)
by: Rachapudi, Jagadeesh, et al.
Published: (2026)
EditMark: Watermarking Large Language Models based on Model Editing
by: Li, Shuai, et al.
Published: (2025)
by: Li, Shuai, et al.
Published: (2025)
Practical Reasoning Interruption Attacks on Reasoning Large Language Models
by: Cui, Yu, et al.
Published: (2025)
by: Cui, Yu, et al.
Published: (2025)
PortGPT: Towards Automated Backporting Using Large Language Models
by: Li, Zhaoyang, et al.
Published: (2025)
by: Li, Zhaoyang, et al.
Published: (2025)
AICrypto: Evaluating Cryptography Capabilities of Large Language Models
by: Wang, Yu, et al.
Published: (2025)
by: Wang, Yu, et al.
Published: (2025)
Similar Items
-
The Erasure Illusion: Stress-Testing the Generalization of LLM Forgetting Evaluation
by: Jia, Hengrui, et al.
Published: (2025) -
SoK: Understanding (New) Security Issues Across AI4Code Use Cases
by: Wu, Qilong, et al.
Published: (2025) -
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
by: Wang, Xiangwen, et al.
Published: (2026) -
The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis
by: Wu, Qilong, et al.
Published: (2024) -
Privately Aligning Language Models with Reinforcement Learning
by: Wu, Fan, et al.
Published: (2023)