Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Jiajia, Yang, Lu, Peng, Letian, Zhang, Shitou, Wang, Ping, Li, Zuchao, Zhao, Hai
Format:	Preprint
Published:	2022
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2201.00965
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929414187515904
author	Li, Jiajia Yang, Lu Peng, Letian Zhang, Shitou Wang, Ping Li, Zuchao Zhao, Hai
author_facet	Li, Jiajia Yang, Lu Peng, Letian Zhang, Shitou Wang, Ping Li, Zuchao Zhao, Hai
contents	In recent years, machine learning - particularly deep learning - has significantly impacted the field of information management. While several strategies have been proposed to restrict models from learning and memorizing sensitive information from raw texts, this paper suggests a more linguistically-grounded approach to distort texts while maintaining semantic integrity. To this end, we leverage Neighboring Distribution Divergence, a novel metric to assess the preservation of semantic meaning during distortion. Building on this metric, we present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. Our evaluations across various tasks, including named entity recognition, constituency parsing, and machine reading comprehension, affirm the plausibility and efficacy of our distortion technique in personal privacy protection. We also test our method against attribute attacks in three privacy-focused assignments within the NLP domain, and the findings underscore the simplicity and efficacy of our data-based improvement approach over structural improvement approaches. Moreover, we explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization, underscoring its practicality.
format	Preprint
id	arxiv_https___arxiv_org_abs_2201_00965
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	Semantics-Preserved Distortion for Personal Privacy Protection in Information Management Li, Jiajia Yang, Lu Peng, Letian Zhang, Shitou Wang, Ping Li, Zuchao Zhao, Hai Computation and Language In recent years, machine learning - particularly deep learning - has significantly impacted the field of information management. While several strategies have been proposed to restrict models from learning and memorizing sensitive information from raw texts, this paper suggests a more linguistically-grounded approach to distort texts while maintaining semantic integrity. To this end, we leverage Neighboring Distribution Divergence, a novel metric to assess the preservation of semantic meaning during distortion. Building on this metric, we present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. Our evaluations across various tasks, including named entity recognition, constituency parsing, and machine reading comprehension, affirm the plausibility and efficacy of our distortion technique in personal privacy protection. We also test our method against attribute attacks in three privacy-focused assignments within the NLP domain, and the findings underscore the simplicity and efficacy of our data-based improvement approach over structural improvement approaches. Moreover, we explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization, underscoring its practicality.
title	Semantics-Preserved Distortion for Personal Privacy Protection in Information Management
topic	Computation and Language
url	https://arxiv.org/abs/2201.00965

Similar Items