:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Morris, John X., Sitawarin, Chawin, Guo, Chuan, Kokhlikyan, Narine, Suh, G. Edward, Rush, Alexander M., Chaudhuri, Kamalika, Mahloujifar, Saeed
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.24832
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Privacy Blur: Quantifying Privacy and Utility for Image Data Release
by: Mahloujifar, Saeed, et al.
Published: (2025)

Z0-Inf: Zeroth Order Approximation for Data Influence
by: Kokhlikyan, Narine, et al.
Published: (2025)

Machine Learning with Privacy for Protected Attributes
by: Mahloujifar, Saeed, et al.
Published: (2025)

CIMemories: A Compositional Benchmark for Contextual Integrity of Persistent Memory in LLMs
by: Mireshghallah, Niloofar, et al.
Published: (2025)

Measuring Déjà vu Memorization Efficiently
by: Kokhlikyan, Narine, et al.
Published: (2025)

Auditing $f$-Differential Privacy in One Run
by: Mahloujifar, Saeed, et al.
Published: (2024)

RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection
by: Wen, Yuxin, et al.
Published: (2025)

Privacy Amplification for the Gaussian Mechanism via Bounded Support
by: Hu, Shengyuan, et al.
Published: (2024)

SecAlign: Defending Against Prompt Injection with Preference Optimization
by: Chen, Sizhe, et al.
Published: (2024)

Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds
by: Chaudhuri, Kamalika, et al.
Published: (2024)

PAL: Proxy-Guided Black-Box Attack on Large Language Models
by: Sitawarin, Chawin, et al.
Published: (2024)

Contextual Document Embeddings
by: Morris, John X., et al.
Published: (2024)

Positional Embedding-Aware Activations
by: Shah, Kathan, et al.
Published: (2023)

Mark My Words: Analyzing and Evaluating Language Model Watermarks
by: Piet, Julien, et al.
Published: (2023)

Vulnerability Detection with Code Language Models: How Far Are We?
by: Ding, Yangruibo, et al.
Published: (2024)

Data Redaction from Conditional Generative Models
by: Kong, Zhifeng, et al.
Published: (2023)

Disentangling generalization and memorization in large language models using chess
by: Pleiss, Leonard S., et al.
Published: (2026)

Learning to Reason in 13 Parameters
by: Morris, John X., et al.
Published: (2026)

Learning-Time Encoding Shapes Unlearning in LLMs
by: Wu, Ruihan, et al.
Published: (2025)

Do language models plan ahead for future tokens?
by: Wu, Wilson, et al.
Published: (2024)

Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning
by: Wannan, et al.
Published: (2025)

On Symmetries in Convolutional Weights
by: Alsallakh, Bilal, et al.
Published: (2025)

Jatmo: Prompt Injection Defense by Task-Specific Finetuning
by: Piet, Julien, et al.
Published: (2023)

Déjà Vu Memorization in Vision-Language Models
by: Jayaraman, Bargav, et al.
Published: (2024)

Approximating Language Model Training Data from Weights
by: Morris, John X., et al.
Published: (2025)

Evaluating Deep Unlearning in Large Language Models
by: Wu, Ruihan, et al.
Published: (2024)

Extracting memorized pieces of (copyrighted) books from open-weight language models
by: Cooper, A. Feder, et al.
Published: (2025)

OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift
by: Li, Lin, et al.
Published: (2023)

StruQ: Defending Against Prompt Injection with Structured Queries
by: Chen, Sizhe, et al.
Published: (2024)

Exploring prompts to elicit memorization in masked language model-based named entity recognition
by: Xia, Yuxi, et al.
Published: (2024)

How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu
by: Akera, Benjamin, et al.
Published: (2025)

Privacy-Preserving Retrieval-Augmented Generation with Differential Privacy
by: Koga, Tatsuki, et al.
Published: (2024)

Can We Infer Confidential Properties of Training Data from LLMs?
by: Huang, Pengrun, et al.
Published: (2025)

Private Fine-tuning of Large Language Models with Zeroth-order Optimization
by: Tang, Xinyu, et al.
Published: (2024)

Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations
by: Tomani, Christian, et al.
Published: (2024)

Unlocking Visual Secrets: Inverting Features with Diffusion Priors for Image Reconstruction
by: Zhang, Sai Qian, et al.
Published: (2024)

How much do contextualized representations encode long-range context?
by: Sun, Simeng, et al.
Published: (2024)

Publicly-Detectable Watermarking for Language Models
by: Fairoze, Jaiden, et al.
Published: (2023)

PubDef: Defending Against Transfer Attacks From Public Models
by: Sitawarin, Chawin, et al.
Published: (2023)

Defending Against Prompt Injection With a Few DefensiveTokens
by: Chen, Sizhe, et al.
Published: (2025)