Saved in:
| Main Authors: | Lin, Sharon, Krishnamurthy, Dvijotham, Hayes, Jamie, Shi, Chongyang, Shumailov, Ilia, Song, Shuang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.17578 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Buffer Overflow in Mixture of Experts
by: Hayes, Jamie, et al.
Published: (2024)
by: Hayes, Jamie, et al.
Published: (2024)
Interpreting the Repeated Token Phenomenon in Large Language Models
by: Yona, Itay, et al.
Published: (2025)
by: Yona, Itay, et al.
Published: (2025)
Beyond Slow Signs in High-fidelity Model Extraction
by: Foerster, Hanna, et al.
Published: (2024)
by: Foerster, Hanna, et al.
Published: (2024)
Measuring memorization in RLHF for code completion
by: Pappu, Aneesh, et al.
Published: (2024)
by: Pappu, Aneesh, et al.
Published: (2024)
Cascading Adversarial Bias from Injection to Distillation in Language Models
by: Chaudhari, Harsh, et al.
Published: (2025)
by: Chaudhari, Harsh, et al.
Published: (2025)
Stealing User Prompts from Mixture of Experts
by: Yona, Itay, et al.
Published: (2024)
by: Yona, Itay, et al.
Published: (2024)
Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy
by: Hayes, Jamie, et al.
Published: (2024)
by: Hayes, Jamie, et al.
Published: (2024)
Soft Instruction De-escalation Defense
by: Walter, Nils Philipp, et al.
Published: (2025)
by: Walter, Nils Philipp, et al.
Published: (2025)
Lessons from Defending Gemini Against Indirect Prompt Injections
by: Shi, Chongyang, et al.
Published: (2025)
by: Shi, Chongyang, et al.
Published: (2025)
Quantamination: Dynamic Quantization Leaks Your Data Across the Batch
by: Foerster, Hanna, et al.
Published: (2026)
by: Foerster, Hanna, et al.
Published: (2026)
Demystifying Verbatim Memorization in Large Language Models
by: Huang, Jing, et al.
Published: (2024)
by: Huang, Jing, et al.
Published: (2024)
Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models
by: Chaudhari, Harsh, et al.
Published: (2026)
by: Chaudhari, Harsh, et al.
Published: (2026)
Locking Machine Learning Models into Hardware
by: Clifford, Eleanor, et al.
Published: (2024)
by: Clifford, Eleanor, et al.
Published: (2024)
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
by: Foerster, Hanna, et al.
Published: (2025)
by: Foerster, Hanna, et al.
Published: (2025)
Achieving the Tightest Relaxation of Sigmoids for Formal Verification
by: Chevalier, Samuel, et al.
Published: (2024)
by: Chevalier, Samuel, et al.
Published: (2024)
Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias
by: Wyllie, Sierra, et al.
Published: (2024)
by: Wyllie, Sierra, et al.
Published: (2024)
Norm-Bounded Low-Rank Adaptation
by: Wang, Ruigang, et al.
Published: (2025)
by: Wang, Ruigang, et al.
Published: (2025)
Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks
by: Wang, Ruigang, et al.
Published: (2024)
by: Wang, Ruigang, et al.
Published: (2024)
SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks
by: Gao, Yue, et al.
Published: (2023)
by: Gao, Yue, et al.
Published: (2023)
Machine Learning needs Better Randomness Standards: Randomised Smoothing and PRNG-based attacks
by: Dahiya, Pranav, et al.
Published: (2023)
by: Dahiya, Pranav, et al.
Published: (2023)
Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation
by: Küchler, Nicolas, et al.
Published: (2025)
by: Küchler, Nicolas, et al.
Published: (2025)
Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots
by: Vero, Mark, et al.
Published: (2026)
by: Vero, Mark, et al.
Published: (2026)
Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
by: Bose, Avinandan, et al.
Published: (2025)
by: Bose, Avinandan, et al.
Published: (2025)
Measuring memorization in language models via probabilistic extraction
by: Hayes, Jamie, et al.
Published: (2024)
by: Hayes, Jamie, et al.
Published: (2024)
ceLLMate: Sandboxing Browser AI Agents
by: Meng, Luoxi, et al.
Published: (2025)
by: Meng, Luoxi, et al.
Published: (2025)
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI
by: Shumailov, Ilia, et al.
Published: (2024)
by: Shumailov, Ilia, et al.
Published: (2024)
Watermarking Needs Input Repetition Masking
by: Khachaturov, David, et al.
Published: (2025)
by: Khachaturov, David, et al.
Published: (2025)
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections
by: Nasr, Milad, et al.
Published: (2025)
by: Nasr, Milad, et al.
Published: (2025)
Beyond Labeling Oracles: What does it mean to steal ML models?
by: Shafran, Avital, et al.
Published: (2023)
by: Shafran, Avital, et al.
Published: (2023)
ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
by: Chen, Tong, et al.
Published: (2025)
by: Chen, Tong, et al.
Published: (2025)
Machine Learning Models Have a Supply Chain Problem
by: Meiklejohn, Sarah, et al.
Published: (2025)
by: Meiklejohn, Sarah, et al.
Published: (2025)
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
by: Kim, Kyuyoung, et al.
Published: (2024)
by: Kim, Kyuyoung, et al.
Published: (2024)
Hardware and Software Platform Inference
by: Zhang, Cheng, et al.
Published: (2024)
by: Zhang, Cheng, et al.
Published: (2024)
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
by: Zhang, Cheng, et al.
Published: (2023)
by: Zhang, Cheng, et al.
Published: (2023)
Provably Bounding Neural Network Preimages
by: Kotha, Suhas, et al.
Published: (2023)
by: Kotha, Suhas, et al.
Published: (2023)
Beyond Laplace and Gaussian: Exploring the Generalized Gaussian Mechanism for Private Machine Learning
by: Rinberg, Roy, et al.
Published: (2025)
by: Rinberg, Roy, et al.
Published: (2025)
ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks
by: Clifford, Eleanor, et al.
Published: (2022)
by: Clifford, Eleanor, et al.
Published: (2022)
When Vision Fails: Text Attacks Against ViT and OCR
by: Boucher, Nicholas, et al.
Published: (2023)
by: Boucher, Nicholas, et al.
Published: (2023)
No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
by: Kazdan, Joshua, et al.
Published: (2025)
by: Kazdan, Joshua, et al.
Published: (2025)
Learning to Receive Help: Intervention-Aware Concept Embedding Models
by: Zarlenga, Mateo Espinosa, et al.
Published: (2023)
by: Zarlenga, Mateo Espinosa, et al.
Published: (2023)
Similar Items
-
Buffer Overflow in Mixture of Experts
by: Hayes, Jamie, et al.
Published: (2024) -
Interpreting the Repeated Token Phenomenon in Large Language Models
by: Yona, Itay, et al.
Published: (2025) -
Beyond Slow Signs in High-fidelity Model Extraction
by: Foerster, Hanna, et al.
Published: (2024) -
Measuring memorization in RLHF for code completion
by: Pappu, Aneesh, et al.
Published: (2024) -
Cascading Adversarial Bias from Injection to Distillation in Language Models
by: Chaudhari, Harsh, et al.
Published: (2025)