MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Flemings, James, Jiang, Bo, Zhang, Wanrong, Takhirov, Zafar, Annavaram, Murali
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language Machine Learning
Accesso online:	https://arxiv.org/abs/2410.03026
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866912765928538112
author	Flemings, James Jiang, Bo Zhang, Wanrong Takhirov, Zafar Annavaram, Murali
author_facet	Flemings, James Jiang, Bo Zhang, Wanrong Takhirov, Zafar Annavaram, Murali
contents	Language models (LMs) rely on their parametric knowledge augmented with relevant contextual knowledge for certain tasks, such as question answering. However, the contextual knowledge can contain private information that may be leaked when answering queries, and estimating this privacy leakage is not well understood. A straightforward approach of directly comparing an LM's output to the contexts can overestimate the privacy risk, since the LM's parametric knowledge might already contain the augmented contextual knowledge. To this end, we introduce context influence, a metric that builds on differential privacy, a widely-adopted privacy notion, to estimate the privacy leakage of contextual knowledge during decoding. Our approach effectively measures how each subset of the context influences an LM's response while separating the specific parametric knowledge of the LM. Using our context influence metric, we demonstrate that context privacy leakage occurs when contextual knowledge is out of distribution with respect to parametric knowledge. Moreover, we experimentally demonstrate how context influence properly attributes the privacy leakage to augmented contexts, and we evaluate how factors -- such as model size, context size, generation position, etc. -- affect context privacy leakage. The practical implications of our results will inform practitioners of the privacy risk associated with augmented contextual knowledge.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_03026
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models Flemings, James Jiang, Bo Zhang, Wanrong Takhirov, Zafar Annavaram, Murali Computation and Language Machine Learning Language models (LMs) rely on their parametric knowledge augmented with relevant contextual knowledge for certain tasks, such as question answering. However, the contextual knowledge can contain private information that may be leaked when answering queries, and estimating this privacy leakage is not well understood. A straightforward approach of directly comparing an LM's output to the contexts can overestimate the privacy risk, since the LM's parametric knowledge might already contain the augmented contextual knowledge. To this end, we introduce context influence, a metric that builds on differential privacy, a widely-adopted privacy notion, to estimate the privacy leakage of contextual knowledge during decoding. Our approach effectively measures how each subset of the context influences an LM's response while separating the specific parametric knowledge of the LM. Using our context influence metric, we demonstrate that context privacy leakage occurs when contextual knowledge is out of distribution with respect to parametric knowledge. Moreover, we experimentally demonstrate how context influence properly attributes the privacy leakage to augmented contexts, and we evaluate how factors -- such as model size, context size, generation position, etc. -- affect context privacy leakage. The practical implications of our results will inform practitioners of the privacy risk associated with augmented contextual knowledge.
title	Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2410.03026

Documenti analoghi