:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rezkellah, Fatmazohra, Dakhmouche, Ramzi
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language Cryptography and Security Computers and Society Optimization and Control
Online Access:	https://arxiv.org/abs/2510.03567
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024)

Exploring the Robustness of In-Context Learning with Noisy Labels
by: Cheng, Chen, et al.
Published: (2024)

The Utility and Complexity of in- and out-of-Distribution Machine Unlearning
by: Allouah, Youssef, et al.
Published: (2024)

Secure LLM Fine-Tuning via Safety-Aware Probing
by: Wu, Chengcan, et al.
Published: (2025)

Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
by: Zhang, Zhixin, et al.
Published: (2025)

Boosting Jailbreak Attack with Momentum
by: Zhang, Yihao, et al.
Published: (2024)

RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning
by: Wei, Zeming, et al.
Published: (2026)

GSE: Group-wise Sparse and Explainable Adversarial Attacks
by: Sadiku, Shpresim, et al.
Published: (2023)

UCD: Unlearning in LLMs via Contrastive Decoding
by: Suriyakumar, Vinith M., et al.
Published: (2025)

An Adversarial Perspective on Machine Unlearning for AI Safety
by: Łucki, Jakub, et al.
Published: (2024)

Differential Privacy via Distributionally Robust Optimization
by: Selvi, Aras, et al.
Published: (2023)

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
by: Xu, Xiaoyu, et al.
Published: (2025)

SMI: Statistical Membership Inference for Reliable Unlearned Model Auditing
by: Sun, Jialong, et al.
Published: (2026)

Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
by: Jang, Yeonwoo, et al.
Published: (2025)

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)

Machine Unlearning: Taxonomy, Metrics, Applications, Challenges, and Prospects
by: Li, Na, et al.
Published: (2024)

Efficient Optimization Algorithms for Linear Adversarial Training
by: RIbeiro, Antônio H., et al.
Published: (2024)

Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs
by: Chen, Yen-Shan, et al.
Published: (2026)

Second-Order Min-Max Optimization with Lazy Hessians
by: Chen, Lesi, et al.
Published: (2024)

Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization
by: Ribeiro, Antônio H., et al.
Published: (2025)

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)

k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text
by: Hou, Abe Bohan, et al.
Published: (2024)

SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models
by: Dabiriaghdam, Amirhossein, et al.
Published: (2025)

Identifying and Understanding Cross-Class Features in Adversarial Training
by: Wei, Zeming, et al.
Published: (2025)

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
by: Islamov, Rustem, et al.
Published: (2026)

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
by: Xu, Xiaoyu, et al.
Published: (2025)

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks
by: Wang, Rui, et al.
Published: (2025)

Machine Unlearning Fails to Remove Data Poisoning Attacks
by: Pawelczyk, Martin, et al.
Published: (2024)

Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks
by: Di, Jimmy Z., et al.
Published: (2022)

On the Duality Between Sharpness-Aware Minimization and Adversarial Training
by: Zhang, Yihao, et al.
Published: (2024)

Smoothed Normalization for Efficient Distributed Private Optimization
by: Shulgin, Egor, et al.
Published: (2025)

Residual-Evasive Attacks on ADMM in Distributed Optimization
by: Bruckmeier, Sabrina, et al.
Published: (2025)

FedADMM-InSa: An Inexact and Self-Adaptive ADMM for Federated Learning
by: Song, Yongcun, et al.
Published: (2024)

The Privacy Power of Correlated Noise in Decentralized Learning
by: Allouah, Youssef, et al.
Published: (2024)

SafeCOMM: A Study on Safety Degradation in Fine-Tuned Telecom Large Language Models
by: Djuhera, Aladin, et al.
Published: (2025)

BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers
by: Xue, Jiaqi, et al.
Published: (2024)

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
by: An, Bang, et al.
Published: (2024)

In the Name of Fairness: Assessing the Bias in Clinical Record De-identification
by: Xiao, Yuxin, et al.
Published: (2023)

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
by: Zhu, Kaijie, et al.
Published: (2023)

Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
by: Brown, Hannah, et al.
Published: (2024)