:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Sharon, Krishnamurthy, Dvijotham, Hayes, Jamie, Shi, Chongyang, Shumailov, Ilia, Song, Shuang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2503.17578
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Buffer Overflow in Mixture of Experts
by: Hayes, Jamie, et al.
Published: (2024)

Interpreting the Repeated Token Phenomenon in Large Language Models
by: Yona, Itay, et al.
Published: (2025)

Beyond Slow Signs in High-fidelity Model Extraction
by: Foerster, Hanna, et al.
Published: (2024)

Measuring memorization in RLHF for code completion
by: Pappu, Aneesh, et al.
Published: (2024)

Cascading Adversarial Bias from Injection to Distillation in Language Models
by: Chaudhari, Harsh, et al.
Published: (2025)

Stealing User Prompts from Mixture of Experts
by: Yona, Itay, et al.
Published: (2024)

Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy
by: Hayes, Jamie, et al.
Published: (2024)

Soft Instruction De-escalation Defense
by: Walter, Nils Philipp, et al.
Published: (2025)

Lessons from Defending Gemini Against Indirect Prompt Injections
by: Shi, Chongyang, et al.
Published: (2025)

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch
by: Foerster, Hanna, et al.
Published: (2026)

Demystifying Verbatim Memorization in Large Language Models
by: Huang, Jing, et al.
Published: (2024)

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models
by: Chaudhari, Harsh, et al.
Published: (2026)

Locking Machine Learning Models into Hardware
by: Clifford, Eleanor, et al.
Published: (2024)

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
by: Foerster, Hanna, et al.
Published: (2025)

Achieving the Tightest Relaxation of Sigmoids for Formal Verification
by: Chevalier, Samuel, et al.
Published: (2024)

Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias
by: Wyllie, Sierra, et al.
Published: (2024)

Norm-Bounded Low-Rank Adaptation
by: Wang, Ruigang, et al.
Published: (2025)

Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks
by: Wang, Ruigang, et al.
Published: (2024)

SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks
by: Gao, Yue, et al.
Published: (2023)

Machine Learning needs Better Randomness Standards: Randomised Smoothing and PRNG-based attacks
by: Dahiya, Pranav, et al.
Published: (2023)

Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation
by: Küchler, Nicolas, et al.
Published: (2025)

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots
by: Vero, Mark, et al.
Published: (2026)

Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
by: Bose, Avinandan, et al.
Published: (2025)

Measuring memorization in language models via probabilistic extraction
by: Hayes, Jamie, et al.
Published: (2024)

ceLLMate: Sandboxing Browser AI Agents
by: Meng, Luoxi, et al.
Published: (2025)

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI
by: Shumailov, Ilia, et al.
Published: (2024)

Watermarking Needs Input Repetition Masking
by: Khachaturov, David, et al.
Published: (2025)

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections
by: Nasr, Milad, et al.
Published: (2025)

Beyond Labeling Oracles: What does it mean to steal ML models?
by: Shafran, Avital, et al.
Published: (2023)

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
by: Chen, Tong, et al.
Published: (2025)

Machine Learning Models Have a Supply Chain Problem
by: Meiklejohn, Sarah, et al.
Published: (2025)

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
by: Kim, Kyuyoung, et al.
Published: (2024)

Hardware and Software Platform Inference
by: Zhang, Cheng, et al.
Published: (2024)

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
by: Zhang, Cheng, et al.
Published: (2023)

Provably Bounding Neural Network Preimages
by: Kotha, Suhas, et al.
Published: (2023)

Beyond Laplace and Gaussian: Exploring the Generalized Gaussian Mechanism for Private Machine Learning
by: Rinberg, Roy, et al.
Published: (2025)

ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks
by: Clifford, Eleanor, et al.
Published: (2022)

When Vision Fails: Text Attacks Against ViT and OCR
by: Boucher, Nicholas, et al.
Published: (2023)

No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
by: Kazdan, Joshua, et al.
Published: (2025)

Learning to Receive Help: Intervention-Aware Concept Embedding Models
by: Zarlenga, Mateo Espinosa, et al.
Published: (2023)