Saved in:
Bibliographic Details
Main Author: Zhou, Xingyu
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.23131
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911707494875136
author Zhou, Xingyu
author_facet Zhou, Xingyu
contents In this note, I would like to share a small research moment where Codex helped me find the right way to adapt rare switching to the private setting. The standard determinant-based update rule in linear bandits and RL works beautifully because the design matrix grows monotonically. But once Gaussian noise is added for privacy, this monotonicity can fail, and the usual analysis no longer goes through. The key reason is that determinant growth controls volume, while regret analysis needs control of the worst direction. To address this, Codex comes up with a different rare-switching rule based on the generalized Rayleigh quotient, which restores logarithmic policy updates and the desired confidence-width comparison up to a constant factor. I present my manually clean-up version of the proof here as well as some personal reflection on this example.
format Preprint
id arxiv_https___arxiv_org_abs_2605_23131
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle When Determinants Are Not Enough: Private Rare Switching
Zhou, Xingyu
Machine Learning
In this note, I would like to share a small research moment where Codex helped me find the right way to adapt rare switching to the private setting. The standard determinant-based update rule in linear bandits and RL works beautifully because the design matrix grows monotonically. But once Gaussian noise is added for privacy, this monotonicity can fail, and the usual analysis no longer goes through. The key reason is that determinant growth controls volume, while regret analysis needs control of the worst direction. To address this, Codex comes up with a different rare-switching rule based on the generalized Rayleigh quotient, which restores logarithmic policy updates and the desired confidence-width comparison up to a constant factor. I present my manually clean-up version of the proof here as well as some personal reflection on this example.
title When Determinants Are Not Enough: Private Rare Switching
topic Machine Learning
url https://arxiv.org/abs/2605.23131