Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hutton, Chase, Melrod, Adam, Shao, Han
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computer Science and Game Theory
Online Access:	https://arxiv.org/abs/2602.06257
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912882802819072
author	Hutton, Chase Melrod, Adam Shao, Han
author_facet	Hutton, Chase Melrod, Adam Shao, Han
contents	Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such strategic behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T\|\mathcal H\|)$, which is far from the standard online learning rate of $O(\sqrt{T\log\|\mathcal H\|})$. In this work, we provide refined bounds for online strategic classification in both settings. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal{H}) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal{H}) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We also provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give a proper learner using convex optimization techniques to improve the regret upper bound to $O(\sqrt{T \log \|\mathcal{H}\|} + \|\mathcal{H}\| \log(T\|\mathcal{H}\|))$. We show a matching lower bound up to logarithmic factors for all proper learning rules, demonstrating the optimality of our learner among proper learners. As such, improper learning is necessary to further improve regret guarantees.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_06257
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	On Randomized Algorithms in Online Strategic Classification Hutton, Chase Melrod, Adam Shao, Han Machine Learning Computer Science and Game Theory Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such strategic behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T\|\mathcal H\|)$, which is far from the standard online learning rate of $O(\sqrt{T\log\|\mathcal H\|})$. In this work, we provide refined bounds for online strategic classification in both settings. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal{H}) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal{H}) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We also provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give a proper learner using convex optimization techniques to improve the regret upper bound to $O(\sqrt{T \log \|\mathcal{H}\|} + \|\mathcal{H}\| \log(T\|\mathcal{H}\|))$. We show a matching lower bound up to logarithmic factors for all proper learning rules, demonstrating the optimality of our learner among proper learners. As such, improper learning is necessary to further improve regret guarantees.
title	On Randomized Algorithms in Online Strategic Classification
topic	Machine Learning Computer Science and Game Theory
url	https://arxiv.org/abs/2602.06257

Similar Items