Saved in:
Bibliographic Details
Main Authors: Gusain, Vaibhav, Leith, Douglas
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.07772
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910643867615232
author Gusain, Vaibhav
Leith, Douglas
author_facet Gusain, Vaibhav
Leith, Douglas
contents In this paper we propose use of a k-anonymity-like approach for evaluating the privacy of redacted text. Given a piece of redacted text we use a state of the art transformer-based deep learning network to reconstruct the original text. This generates multiple full texts that are consistent with the redacted text, i.e. which are grammatical, have the same non-redacted words etc, and represents each of these using an embedding vector that captures sentence similarity. In this way we can estimate the number, diversity and quality of full text consistent with the redacted text and so evaluate privacy.
format Preprint
id arxiv_https___arxiv_org_abs_2410_07772
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Towards Quantifying The Privacy Of Redacted Text
Gusain, Vaibhav
Leith, Douglas
Machine Learning
In this paper we propose use of a k-anonymity-like approach for evaluating the privacy of redacted text. Given a piece of redacted text we use a state of the art transformer-based deep learning network to reconstruct the original text. This generates multiple full texts that are consistent with the redacted text, i.e. which are grammatical, have the same non-redacted words etc, and represents each of these using an embedding vector that captures sentence similarity. In this way we can estimate the number, diversity and quality of full text consistent with the redacted text and so evaluate privacy.
title Towards Quantifying The Privacy Of Redacted Text
topic Machine Learning
url https://arxiv.org/abs/2410.07772