Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ravichander, Abhilasha, Fisher, Jillian, Sorensen, Taylor, Lu, Ximing, Lin, Yuchen, Antoniak, Maria, Mireshghallah, Niloofar, Bhagavatula, Chandra, Choi, Yejin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.12072
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929761538801664
author	Ravichander, Abhilasha Fisher, Jillian Sorensen, Taylor Lu, Ximing Lin, Yuchen Antoniak, Maria Mireshghallah, Niloofar Bhagavatula, Chandra Choi, Yejin
author_facet	Ravichander, Abhilasha Fisher, Jillian Sorensen, Taylor Lu, Ximing Lin, Yuchen Antoniak, Maria Mireshghallah, Niloofar Bhagavatula, Chandra Choi, Yejin
contents	High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. This lack of transparency creates multiple challenges: it limits external oversight and inspection of LLMs for issues such as copyright infringement, it undermines the agency of data authors, and it hinders scientific research on critical issues such as data contamination and data selection. How can we recover what training data is known to LLMs? In this work, we demonstrate a new method to identify training data known to proprietary LLMs like GPT-4 without requiring any access to model weights or token probabilities, by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes. By evaluating a model's ability to successfully reconstruct high-surprisal tokens in text, we can identify a surprising number of texts memorized by LLMs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_12072
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models Ravichander, Abhilasha Fisher, Jillian Sorensen, Taylor Lu, Ximing Lin, Yuchen Antoniak, Maria Mireshghallah, Niloofar Bhagavatula, Chandra Choi, Yejin Computation and Language High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. This lack of transparency creates multiple challenges: it limits external oversight and inspection of LLMs for issues such as copyright infringement, it undermines the agency of data authors, and it hinders scientific research on critical issues such as data contamination and data selection. How can we recover what training data is known to LLMs? In this work, we demonstrate a new method to identify training data known to proprietary LLMs like GPT-4 without requiring any access to model weights or token probabilities, by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes. By evaluating a model's ability to successfully reconstruct high-surprisal tokens in text, we can identify a surprising number of texts memorized by LLMs.
title	Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2503.12072

Similar Items