Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Zhou, Xingyu
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.23131
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911707494875136
author	Zhou, Xingyu
author_facet	Zhou, Xingyu
contents	In this note, I would like to share a small research moment where Codex helped me find the right way to adapt rare switching to the private setting. The standard determinant-based update rule in linear bandits and RL works beautifully because the design matrix grows monotonically. But once Gaussian noise is added for privacy, this monotonicity can fail, and the usual analysis no longer goes through. The key reason is that determinant growth controls volume, while regret analysis needs control of the worst direction. To address this, Codex comes up with a different rare-switching rule based on the generalized Rayleigh quotient, which restores logarithmic policy updates and the desired confidence-width comparison up to a constant factor. I present my manually clean-up version of the proof here as well as some personal reflection on this example.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_23131
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	When Determinants Are Not Enough: Private Rare Switching Zhou, Xingyu Machine Learning In this note, I would like to share a small research moment where Codex helped me find the right way to adapt rare switching to the private setting. The standard determinant-based update rule in linear bandits and RL works beautifully because the design matrix grows monotonically. But once Gaussian noise is added for privacy, this monotonicity can fail, and the usual analysis no longer goes through. The key reason is that determinant growth controls volume, while regret analysis needs control of the worst direction. To address this, Codex comes up with a different rare-switching rule based on the generalized Rayleigh quotient, which restores logarithmic policy updates and the desired confidence-width comparison up to a constant factor. I present my manually clean-up version of the proof here as well as some personal reflection on this example.
title	When Determinants Are Not Enough: Private Rare Switching
topic	Machine Learning
url	https://arxiv.org/abs/2605.23131

Similar Items