Saved in:
Bibliographic Details
Main Authors: Sun, Ziteng, Kairouz, Peter, Sun, Haicheng, Gascon, Adria, Suresh, Ananda Theertha
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.11607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911845167661056
author Sun, Ziteng
Kairouz, Peter
Sun, Haicheng
Gascon, Adria
Suresh, Ananda Theertha
author_facet Sun, Ziteng
Kairouz, Peter
Sun, Haicheng
Gascon, Adria
Suresh, Ananda Theertha
contents The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for Gboard, which builds on recent advances in private federated analytics. The system offers local differential privacy (LDP) guarantees for user contributed words. With anonymous aggregation, the final released result would satisfy central differential privacy guarantees with $\varepsilon = 0.315, δ= 10^{-10}$ for OOV discovery in en-US (English in United States).
format Preprint
id arxiv_https___arxiv_org_abs_2404_11607
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Private federated discovery of out-of-vocabulary words for Gboard
Sun, Ziteng
Kairouz, Peter
Sun, Haicheng
Gascon, Adria
Suresh, Ananda Theertha
Data Structures and Algorithms
The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for Gboard, which builds on recent advances in private federated analytics. The system offers local differential privacy (LDP) guarantees for user contributed words. With anonymous aggregation, the final released result would satisfy central differential privacy guarantees with $\varepsilon = 0.315, δ= 10^{-10}$ for OOV discovery in en-US (English in United States).
title Private federated discovery of out-of-vocabulary words for Gboard
topic Data Structures and Algorithms
url https://arxiv.org/abs/2404.11607