Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.11607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911845167661056 |
|---|---|
| author | Sun, Ziteng Kairouz, Peter Sun, Haicheng Gascon, Adria Suresh, Ananda Theertha |
| author_facet | Sun, Ziteng Kairouz, Peter Sun, Haicheng Gascon, Adria Suresh, Ananda Theertha |
| contents | The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for Gboard, which builds on recent advances in private federated analytics. The system offers local differential privacy (LDP) guarantees for user contributed words. With anonymous aggregation, the final released result would satisfy central differential privacy guarantees with $\varepsilon = 0.315, δ= 10^{-10}$ for OOV discovery in en-US (English in United States). |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2404_11607 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Private federated discovery of out-of-vocabulary words for Gboard Sun, Ziteng Kairouz, Peter Sun, Haicheng Gascon, Adria Suresh, Ananda Theertha Data Structures and Algorithms The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for Gboard, which builds on recent advances in private federated analytics. The system offers local differential privacy (LDP) guarantees for user contributed words. With anonymous aggregation, the final released result would satisfy central differential privacy guarantees with $\varepsilon = 0.315, δ= 10^{-10}$ for OOV discovery in en-US (English in United States). |
| title | Private federated discovery of out-of-vocabulary words for Gboard |
| topic | Data Structures and Algorithms |
| url | https://arxiv.org/abs/2404.11607 |