Saved in:
| Main Authors: | Morris, John X., Sitawarin, Chawin, Guo, Chuan, Kokhlikyan, Narine, Suh, G. Edward, Rush, Alexander M., Chaudhuri, Kamalika, Mahloujifar, Saeed |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.24832 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Privacy Blur: Quantifying Privacy and Utility for Image Data Release
by: Mahloujifar, Saeed, et al.
Published: (2025)
by: Mahloujifar, Saeed, et al.
Published: (2025)
Z0-Inf: Zeroth Order Approximation for Data Influence
by: Kokhlikyan, Narine, et al.
Published: (2025)
by: Kokhlikyan, Narine, et al.
Published: (2025)
Machine Learning with Privacy for Protected Attributes
by: Mahloujifar, Saeed, et al.
Published: (2025)
by: Mahloujifar, Saeed, et al.
Published: (2025)
CIMemories: A Compositional Benchmark for Contextual Integrity of Persistent Memory in LLMs
by: Mireshghallah, Niloofar, et al.
Published: (2025)
by: Mireshghallah, Niloofar, et al.
Published: (2025)
Measuring Déjà vu Memorization Efficiently
by: Kokhlikyan, Narine, et al.
Published: (2025)
by: Kokhlikyan, Narine, et al.
Published: (2025)
Auditing $f$-Differential Privacy in One Run
by: Mahloujifar, Saeed, et al.
Published: (2024)
by: Mahloujifar, Saeed, et al.
Published: (2024)
RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection
by: Wen, Yuxin, et al.
Published: (2025)
by: Wen, Yuxin, et al.
Published: (2025)
Privacy Amplification for the Gaussian Mechanism via Bounded Support
by: Hu, Shengyuan, et al.
Published: (2024)
by: Hu, Shengyuan, et al.
Published: (2024)
SecAlign: Defending Against Prompt Injection with Preference Optimization
by: Chen, Sizhe, et al.
Published: (2024)
by: Chen, Sizhe, et al.
Published: (2024)
Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds
by: Chaudhuri, Kamalika, et al.
Published: (2024)
by: Chaudhuri, Kamalika, et al.
Published: (2024)
PAL: Proxy-Guided Black-Box Attack on Large Language Models
by: Sitawarin, Chawin, et al.
Published: (2024)
by: Sitawarin, Chawin, et al.
Published: (2024)
Contextual Document Embeddings
by: Morris, John X., et al.
Published: (2024)
by: Morris, John X., et al.
Published: (2024)
Positional Embedding-Aware Activations
by: Shah, Kathan, et al.
Published: (2023)
by: Shah, Kathan, et al.
Published: (2023)
Mark My Words: Analyzing and Evaluating Language Model Watermarks
by: Piet, Julien, et al.
Published: (2023)
by: Piet, Julien, et al.
Published: (2023)
Vulnerability Detection with Code Language Models: How Far Are We?
by: Ding, Yangruibo, et al.
Published: (2024)
by: Ding, Yangruibo, et al.
Published: (2024)
Data Redaction from Conditional Generative Models
by: Kong, Zhifeng, et al.
Published: (2023)
by: Kong, Zhifeng, et al.
Published: (2023)
Disentangling generalization and memorization in large language models using chess
by: Pleiss, Leonard S., et al.
Published: (2026)
by: Pleiss, Leonard S., et al.
Published: (2026)
Learning to Reason in 13 Parameters
by: Morris, John X., et al.
Published: (2026)
by: Morris, John X., et al.
Published: (2026)
Learning-Time Encoding Shapes Unlearning in LLMs
by: Wu, Ruihan, et al.
Published: (2025)
by: Wu, Ruihan, et al.
Published: (2025)
Do language models plan ahead for future tokens?
by: Wu, Wilson, et al.
Published: (2024)
by: Wu, Wilson, et al.
Published: (2024)
Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning
by: Wannan, et al.
Published: (2025)
by: Wannan, et al.
Published: (2025)
On Symmetries in Convolutional Weights
by: Alsallakh, Bilal, et al.
Published: (2025)
by: Alsallakh, Bilal, et al.
Published: (2025)
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
by: Piet, Julien, et al.
Published: (2023)
by: Piet, Julien, et al.
Published: (2023)
Déjà Vu Memorization in Vision-Language Models
by: Jayaraman, Bargav, et al.
Published: (2024)
by: Jayaraman, Bargav, et al.
Published: (2024)
Approximating Language Model Training Data from Weights
by: Morris, John X., et al.
Published: (2025)
by: Morris, John X., et al.
Published: (2025)
Evaluating Deep Unlearning in Large Language Models
by: Wu, Ruihan, et al.
Published: (2024)
by: Wu, Ruihan, et al.
Published: (2024)
Extracting memorized pieces of (copyrighted) books from open-weight language models
by: Cooper, A. Feder, et al.
Published: (2025)
by: Cooper, A. Feder, et al.
Published: (2025)
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift
by: Li, Lin, et al.
Published: (2023)
by: Li, Lin, et al.
Published: (2023)
StruQ: Defending Against Prompt Injection with Structured Queries
by: Chen, Sizhe, et al.
Published: (2024)
by: Chen, Sizhe, et al.
Published: (2024)
Exploring prompts to elicit memorization in masked language model-based named entity recognition
by: Xia, Yuxi, et al.
Published: (2024)
by: Xia, Yuxi, et al.
Published: (2024)
How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu
by: Akera, Benjamin, et al.
Published: (2025)
by: Akera, Benjamin, et al.
Published: (2025)
Privacy-Preserving Retrieval-Augmented Generation with Differential Privacy
by: Koga, Tatsuki, et al.
Published: (2024)
by: Koga, Tatsuki, et al.
Published: (2024)
Can We Infer Confidential Properties of Training Data from LLMs?
by: Huang, Pengrun, et al.
Published: (2025)
by: Huang, Pengrun, et al.
Published: (2025)
Private Fine-tuning of Large Language Models with Zeroth-order Optimization
by: Tang, Xinyu, et al.
Published: (2024)
by: Tang, Xinyu, et al.
Published: (2024)
Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations
by: Tomani, Christian, et al.
Published: (2024)
by: Tomani, Christian, et al.
Published: (2024)
Unlocking Visual Secrets: Inverting Features with Diffusion Priors for Image Reconstruction
by: Zhang, Sai Qian, et al.
Published: (2024)
by: Zhang, Sai Qian, et al.
Published: (2024)
How much do contextualized representations encode long-range context?
by: Sun, Simeng, et al.
Published: (2024)
by: Sun, Simeng, et al.
Published: (2024)
Publicly-Detectable Watermarking for Language Models
by: Fairoze, Jaiden, et al.
Published: (2023)
by: Fairoze, Jaiden, et al.
Published: (2023)
PubDef: Defending Against Transfer Attacks From Public Models
by: Sitawarin, Chawin, et al.
Published: (2023)
by: Sitawarin, Chawin, et al.
Published: (2023)
Defending Against Prompt Injection With a Few DefensiveTokens
by: Chen, Sizhe, et al.
Published: (2025)
by: Chen, Sizhe, et al.
Published: (2025)
Similar Items
-
Privacy Blur: Quantifying Privacy and Utility for Image Data Release
by: Mahloujifar, Saeed, et al.
Published: (2025) -
Z0-Inf: Zeroth Order Approximation for Data Influence
by: Kokhlikyan, Narine, et al.
Published: (2025) -
Machine Learning with Privacy for Protected Attributes
by: Mahloujifar, Saeed, et al.
Published: (2025) -
CIMemories: A Compositional Benchmark for Contextual Integrity of Persistent Memory in LLMs
by: Mireshghallah, Niloofar, et al.
Published: (2025) -
Measuring Déjà vu Memorization Efficiently
by: Kokhlikyan, Narine, et al.
Published: (2025)