Saved in:
Bibliographic Details
Main Authors: Li, Jiajia, Yang, Lu, Peng, Letian, Zhang, Shitou, Wang, Ping, Li, Zuchao, Zhao, Hai
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2201.00965
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929414187515904
author Li, Jiajia
Yang, Lu
Peng, Letian
Zhang, Shitou
Wang, Ping
Li, Zuchao
Zhao, Hai
author_facet Li, Jiajia
Yang, Lu
Peng, Letian
Zhang, Shitou
Wang, Ping
Li, Zuchao
Zhao, Hai
contents In recent years, machine learning - particularly deep learning - has significantly impacted the field of information management. While several strategies have been proposed to restrict models from learning and memorizing sensitive information from raw texts, this paper suggests a more linguistically-grounded approach to distort texts while maintaining semantic integrity. To this end, we leverage Neighboring Distribution Divergence, a novel metric to assess the preservation of semantic meaning during distortion. Building on this metric, we present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. Our evaluations across various tasks, including named entity recognition, constituency parsing, and machine reading comprehension, affirm the plausibility and efficacy of our distortion technique in personal privacy protection. We also test our method against attribute attacks in three privacy-focused assignments within the NLP domain, and the findings underscore the simplicity and efficacy of our data-based improvement approach over structural improvement approaches. Moreover, we explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization, underscoring its practicality.
format Preprint
id arxiv_https___arxiv_org_abs_2201_00965
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Semantics-Preserved Distortion for Personal Privacy Protection in Information Management
Li, Jiajia
Yang, Lu
Peng, Letian
Zhang, Shitou
Wang, Ping
Li, Zuchao
Zhao, Hai
Computation and Language
In recent years, machine learning - particularly deep learning - has significantly impacted the field of information management. While several strategies have been proposed to restrict models from learning and memorizing sensitive information from raw texts, this paper suggests a more linguistically-grounded approach to distort texts while maintaining semantic integrity. To this end, we leverage Neighboring Distribution Divergence, a novel metric to assess the preservation of semantic meaning during distortion. Building on this metric, we present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. Our evaluations across various tasks, including named entity recognition, constituency parsing, and machine reading comprehension, affirm the plausibility and efficacy of our distortion technique in personal privacy protection. We also test our method against attribute attacks in three privacy-focused assignments within the NLP domain, and the findings underscore the simplicity and efficacy of our data-based improvement approach over structural improvement approaches. Moreover, we explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization, underscoring its practicality.
title Semantics-Preserved Distortion for Personal Privacy Protection in Information Management
topic Computation and Language
url https://arxiv.org/abs/2201.00965