Saved in:
Bibliographic Details
Main Authors: Ou, Tingting, Medina, Marco Avella, Cummings, Rachel
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.14879
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911963634728960
author Ou, Tingting
Medina, Marco Avella
Cummings, Rachel
author_facet Ou, Tingting
Medina, Marco Avella
Cummings, Rachel
contents In this work we first show that the classical Thompson sampling algorithm for multi-arm bandits is differentially private as-is, without any modification. We provide per-round privacy guarantees as a function of problem parameters and show composition over $T$ rounds; since the algorithm is unchanged, existing $O(\sqrt{NT\log N})$ regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications -- such as pre-pulling all arms a fixed number of times, increasing the sampling variance -- can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff.
format Preprint
id arxiv_https___arxiv_org_abs_2407_14879
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Thompson Sampling Itself is Differentially Private
Ou, Tingting
Medina, Marco Avella
Cummings, Rachel
Machine Learning
Data Structures and Algorithms
In this work we first show that the classical Thompson sampling algorithm for multi-arm bandits is differentially private as-is, without any modification. We provide per-round privacy guarantees as a function of problem parameters and show composition over $T$ rounds; since the algorithm is unchanged, existing $O(\sqrt{NT\log N})$ regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications -- such as pre-pulling all arms a fixed number of times, increasing the sampling variance -- can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff.
title Thompson Sampling Itself is Differentially Private
topic Machine Learning
Data Structures and Algorithms
url https://arxiv.org/abs/2407.14879