Saved in:
Bibliographic Details
Main Authors: Ravichander, Abhilasha, Fisher, Jillian, Sorensen, Taylor, Lu, Ximing, Lin, Yuchen, Antoniak, Maria, Mireshghallah, Niloofar, Bhagavatula, Chandra, Choi, Yejin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.12072
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929761538801664
author Ravichander, Abhilasha
Fisher, Jillian
Sorensen, Taylor
Lu, Ximing
Lin, Yuchen
Antoniak, Maria
Mireshghallah, Niloofar
Bhagavatula, Chandra
Choi, Yejin
author_facet Ravichander, Abhilasha
Fisher, Jillian
Sorensen, Taylor
Lu, Ximing
Lin, Yuchen
Antoniak, Maria
Mireshghallah, Niloofar
Bhagavatula, Chandra
Choi, Yejin
contents High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. This lack of transparency creates multiple challenges: it limits external oversight and inspection of LLMs for issues such as copyright infringement, it undermines the agency of data authors, and it hinders scientific research on critical issues such as data contamination and data selection. How can we recover what training data is known to LLMs? In this work, we demonstrate a new method to identify training data known to proprietary LLMs like GPT-4 without requiring any access to model weights or token probabilities, by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes. By evaluating a model's ability to successfully reconstruct high-surprisal tokens in text, we can identify a surprising number of texts memorized by LLMs.
format Preprint
id arxiv_https___arxiv_org_abs_2503_12072
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
Ravichander, Abhilasha
Fisher, Jillian
Sorensen, Taylor
Lu, Ximing
Lin, Yuchen
Antoniak, Maria
Mireshghallah, Niloofar
Bhagavatula, Chandra
Choi, Yejin
Computation and Language
High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. This lack of transparency creates multiple challenges: it limits external oversight and inspection of LLMs for issues such as copyright infringement, it undermines the agency of data authors, and it hinders scientific research on critical issues such as data contamination and data selection. How can we recover what training data is known to LLMs? In this work, we demonstrate a new method to identify training data known to proprietary LLMs like GPT-4 without requiring any access to model weights or token probabilities, by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes. By evaluating a model's ability to successfully reconstruct high-surprisal tokens in text, we can identify a surprising number of texts memorized by LLMs.
title Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
topic Computation and Language
url https://arxiv.org/abs/2503.12072