Saved in:
| Main Authors: | Xhonneux, Sophie, Dobre, David, Tang, Jian, Gidel, Gauthier, Sridhar, Dhanya |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.05723 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens
by: Dobre, David, et al.
Published: (2025)
by: Dobre, David, et al.
Published: (2025)
Efficient Adversarial Training in LLMs with Continuous Attacks
by: Xhonneux, Sophie, et al.
Published: (2024)
by: Xhonneux, Sophie, et al.
Published: (2024)
LLM-Safety Evaluations Lack Robustness
by: Beyer, Tim, et al.
Published: (2025)
by: Beyer, Tim, et al.
Published: (2025)
Learning diverse attacks on large language models for robust red-teaming and safety tuning
by: Lee, Seanie, et al.
Published: (2024)
by: Lee, Seanie, et al.
Published: (2024)
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
by: Schwinn, Leo, et al.
Published: (2024)
by: Schwinn, Leo, et al.
Published: (2024)
Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks?
by: Sasnauskas, Paulius, et al.
Published: (2025)
by: Sasnauskas, Paulius, et al.
Published: (2025)
A Giant-Step Baby-Step Classifier For Scalable and Real-Time Anomaly Detection In Industrial Control Systems and Water Treatment Systems
by: Venugopalan, Sarad, et al.
Published: (2025)
by: Venugopalan, Sarad, et al.
Published: (2025)
PAC-Private Responses with Adversarial Composition
by: Zhu, Xiaochen, et al.
Published: (2026)
by: Zhu, Xiaochen, et al.
Published: (2026)
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation
by: Tang, Xinyu, et al.
Published: (2023)
by: Tang, Xinyu, et al.
Published: (2023)
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
by: Wang, Tony T., et al.
Published: (2023)
by: Wang, Tony T., et al.
Published: (2023)
SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?
by: Lee, Hwiwon, et al.
Published: (2026)
by: Lee, Hwiwon, et al.
Published: (2026)
Understanding In-Context Learning of Linear Models in Transformers Through an Adversarial Lens
by: Anwar, Usman, et al.
Published: (2024)
by: Anwar, Usman, et al.
Published: (2024)
Assessing the Effectiveness of Membership Inference on Generative Music
by: Chow, Kurtis, et al.
Published: (2025)
by: Chow, Kurtis, et al.
Published: (2025)
CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation
by: Kirdi, Sadek Misto, et al.
Published: (2024)
by: Kirdi, Sadek Misto, et al.
Published: (2024)
Federated Learning Nodes Can Reconstruct Peers' Image Data
by: Wilson, Ethan, et al.
Published: (2024)
by: Wilson, Ethan, et al.
Published: (2024)
Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling Attacks
by: Cretu, Ana-Maria, et al.
Published: (2024)
by: Cretu, Ana-Maria, et al.
Published: (2024)
Can LLMs Handle WebShell Detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework
by: Han, Feijiang, et al.
Published: (2025)
by: Han, Feijiang, et al.
Published: (2025)
Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows
by: Lin, Jie, et al.
Published: (2025)
by: Lin, Jie, et al.
Published: (2025)
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
by: Yang, Jinluan, et al.
Published: (2024)
by: Yang, Jinluan, et al.
Published: (2024)
RAPID: Robust APT Detection and Investigation Using Context-Aware Deep Learning
by: Amaru, Yonatan, et al.
Published: (2024)
by: Amaru, Yonatan, et al.
Published: (2024)
Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering
by: Kulkarni, Tejas, et al.
Published: (2026)
by: Kulkarni, Tejas, et al.
Published: (2026)
Privacy Re-identification Attacks on Tabular GANs
by: Alshantti, Abdallah, et al.
Published: (2024)
by: Alshantti, Abdallah, et al.
Published: (2024)
A Survey of Source Code Representations for Machine Learning-Based Cybersecurity Tasks
by: Casey, Beatrice, et al.
Published: (2024)
by: Casey, Beatrice, et al.
Published: (2024)
Can Copyright be Reduced to Privacy?
by: Elkin-Koren, Niva, et al.
Published: (2023)
by: Elkin-Koren, Niva, et al.
Published: (2023)
Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation
by: Guo, Wenkai, et al.
Published: (2025)
by: Guo, Wenkai, et al.
Published: (2025)
Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases
by: Cui, Ziyao, et al.
Published: (2025)
by: Cui, Ziyao, et al.
Published: (2025)
Can LLMs Patch Security Issues?
by: Alrashedy, Kamel, et al.
Published: (2023)
by: Alrashedy, Kamel, et al.
Published: (2023)
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
by: Abdali, Sara, et al.
Published: (2024)
by: Abdali, Sara, et al.
Published: (2024)
Can sparsity improve the privacy of neural networks?
by: Gonon, Antoine, et al.
Published: (2023)
by: Gonon, Antoine, et al.
Published: (2023)
Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control
by: Carvalho, Tânia, et al.
Published: (2022)
by: Carvalho, Tânia, et al.
Published: (2022)
Federated Learning based Latent Factorization of Tensors for Privacy-Preserving QoS Prediction
by: Zhong, Shuai, et al.
Published: (2024)
by: Zhong, Shuai, et al.
Published: (2024)
Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy
by: Romijnders, Rob, et al.
Published: (2024)
by: Romijnders, Rob, et al.
Published: (2024)
General Causal Imputation via Synthetic Interventions
by: Jiralerspong, Marco, et al.
Published: (2024)
by: Jiralerspong, Marco, et al.
Published: (2024)
DeepShare: Sharing ReLU Across Channels and Layers for Efficient Private Inference
by: Bornfeld, Yonathan, et al.
Published: (2025)
by: Bornfeld, Yonathan, et al.
Published: (2025)
SoK: Can Trajectory Generation Combine Privacy and Utility?
by: Buchholz, Erik, et al.
Published: (2024)
by: Buchholz, Erik, et al.
Published: (2024)
Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features
by: Beznosikov, Aleksandr, et al.
Published: (2023)
by: Beznosikov, Aleksandr, et al.
Published: (2023)
How Well Can Differential Privacy Be Audited in One Run?
by: Keinan, Amit, et al.
Published: (2025)
by: Keinan, Amit, et al.
Published: (2025)
Forgetting Similar Samples: Can Machine Unlearning Do it Better?
by: Xu, Heng, et al.
Published: (2026)
by: Xu, Heng, et al.
Published: (2026)
Defending Jailbreak Prompts via In-Context Adversarial Game
by: Zhou, Yujun, et al.
Published: (2024)
by: Zhou, Yujun, et al.
Published: (2024)
ReLATE: Resilient Learner Selection for Multivariate Time-Series Classification Against Adversarial Attacks
by: Kocal, Cagla Ipek, et al.
Published: (2025)
by: Kocal, Cagla Ipek, et al.
Published: (2025)
Similar Items
-
A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens
by: Dobre, David, et al.
Published: (2025) -
Efficient Adversarial Training in LLMs with Continuous Attacks
by: Xhonneux, Sophie, et al.
Published: (2024) -
LLM-Safety Evaluations Lack Robustness
by: Beyer, Tim, et al.
Published: (2025) -
Learning diverse attacks on large language models for robust red-teaming and safety tuning
by: Lee, Seanie, et al.
Published: (2024) -
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
by: Schwinn, Leo, et al.
Published: (2024)