Saved in:
| Main Authors: | Rezkellah, Fatmazohra, Dakhmouche, Ramzi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.03567 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
Exploring the Robustness of In-Context Learning with Noisy Labels
by: Cheng, Chen, et al.
Published: (2024)
by: Cheng, Chen, et al.
Published: (2024)
The Utility and Complexity of in- and out-of-Distribution Machine Unlearning
by: Allouah, Youssef, et al.
Published: (2024)
by: Allouah, Youssef, et al.
Published: (2024)
Secure LLM Fine-Tuning via Safety-Aware Probing
by: Wu, Chengcan, et al.
Published: (2025)
by: Wu, Chengcan, et al.
Published: (2025)
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
by: Zhang, Zhixin, et al.
Published: (2025)
by: Zhang, Zhixin, et al.
Published: (2025)
Boosting Jailbreak Attack with Momentum
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning
by: Wei, Zeming, et al.
Published: (2026)
by: Wei, Zeming, et al.
Published: (2026)
GSE: Group-wise Sparse and Explainable Adversarial Attacks
by: Sadiku, Shpresim, et al.
Published: (2023)
by: Sadiku, Shpresim, et al.
Published: (2023)
UCD: Unlearning in LLMs via Contrastive Decoding
by: Suriyakumar, Vinith M., et al.
Published: (2025)
by: Suriyakumar, Vinith M., et al.
Published: (2025)
An Adversarial Perspective on Machine Unlearning for AI Safety
by: Łucki, Jakub, et al.
Published: (2024)
by: Łucki, Jakub, et al.
Published: (2024)
Differential Privacy via Distributionally Robust Optimization
by: Selvi, Aras, et al.
Published: (2023)
by: Selvi, Aras, et al.
Published: (2023)
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
by: Xu, Xiaoyu, et al.
Published: (2025)
by: Xu, Xiaoyu, et al.
Published: (2025)
SMI: Statistical Membership Inference for Reliable Unlearned Model Auditing
by: Sun, Jialong, et al.
Published: (2026)
by: Sun, Jialong, et al.
Published: (2026)
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
by: Jang, Yeonwoo, et al.
Published: (2025)
by: Jang, Yeonwoo, et al.
Published: (2025)
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)
by: Yuan, Hongbang, et al.
Published: (2024)
Machine Unlearning: Taxonomy, Metrics, Applications, Challenges, and Prospects
by: Li, Na, et al.
Published: (2024)
by: Li, Na, et al.
Published: (2024)
Efficient Optimization Algorithms for Linear Adversarial Training
by: RIbeiro, Antônio H., et al.
Published: (2024)
by: RIbeiro, Antônio H., et al.
Published: (2024)
Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs
by: Chen, Yen-Shan, et al.
Published: (2026)
by: Chen, Yen-Shan, et al.
Published: (2026)
Second-Order Min-Max Optimization with Lazy Hessians
by: Chen, Lesi, et al.
Published: (2024)
by: Chen, Lesi, et al.
Published: (2024)
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization
by: Ribeiro, Antônio H., et al.
Published: (2025)
by: Ribeiro, Antônio H., et al.
Published: (2025)
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)
by: Li, Nathaniel, et al.
Published: (2024)
k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text
by: Hou, Abe Bohan, et al.
Published: (2024)
by: Hou, Abe Bohan, et al.
Published: (2024)
SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models
by: Dabiriaghdam, Amirhossein, et al.
Published: (2025)
by: Dabiriaghdam, Amirhossein, et al.
Published: (2025)
Identifying and Understanding Cross-Class Features in Adversarial Training
by: Wei, Zeming, et al.
Published: (2025)
by: Wei, Zeming, et al.
Published: (2025)
Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
by: Xu, Xiaoyu, et al.
Published: (2025)
by: Xu, Xiaoyu, et al.
Published: (2025)
Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks
by: Wang, Rui, et al.
Published: (2025)
by: Wang, Rui, et al.
Published: (2025)
Machine Unlearning Fails to Remove Data Poisoning Attacks
by: Pawelczyk, Martin, et al.
Published: (2024)
by: Pawelczyk, Martin, et al.
Published: (2024)
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks
by: Di, Jimmy Z., et al.
Published: (2022)
by: Di, Jimmy Z., et al.
Published: (2022)
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
Smoothed Normalization for Efficient Distributed Private Optimization
by: Shulgin, Egor, et al.
Published: (2025)
by: Shulgin, Egor, et al.
Published: (2025)
Residual-Evasive Attacks on ADMM in Distributed Optimization
by: Bruckmeier, Sabrina, et al.
Published: (2025)
by: Bruckmeier, Sabrina, et al.
Published: (2025)
FedADMM-InSa: An Inexact and Self-Adaptive ADMM for Federated Learning
by: Song, Yongcun, et al.
Published: (2024)
by: Song, Yongcun, et al.
Published: (2024)
The Privacy Power of Correlated Noise in Decentralized Learning
by: Allouah, Youssef, et al.
Published: (2024)
by: Allouah, Youssef, et al.
Published: (2024)
SafeCOMM: A Study on Safety Degradation in Fine-Tuned Telecom Large Language Models
by: Djuhera, Aladin, et al.
Published: (2025)
by: Djuhera, Aladin, et al.
Published: (2025)
BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers
by: Xue, Jiaqi, et al.
Published: (2024)
by: Xue, Jiaqi, et al.
Published: (2024)
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
by: An, Bang, et al.
Published: (2024)
by: An, Bang, et al.
Published: (2024)
In the Name of Fairness: Assessing the Bias in Clinical Record De-identification
by: Xiao, Yuxin, et al.
Published: (2023)
by: Xiao, Yuxin, et al.
Published: (2023)
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
by: Zhu, Kaijie, et al.
Published: (2023)
by: Zhu, Kaijie, et al.
Published: (2023)
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
by: Brown, Hannah, et al.
Published: (2024)
by: Brown, Hannah, et al.
Published: (2024)
Similar Items
-
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024) -
Exploring the Robustness of In-Context Learning with Noisy Labels
by: Cheng, Chen, et al.
Published: (2024) -
The Utility and Complexity of in- and out-of-Distribution Machine Unlearning
by: Allouah, Youssef, et al.
Published: (2024) -
Secure LLM Fine-Tuning via Safety-Aware Probing
by: Wu, Chengcan, et al.
Published: (2025) -
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
by: Zhang, Zhixin, et al.
Published: (2025)