Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Heuillet, Maxime, Ahmad, Ola, Durand, Audrey
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2402.05002
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913351971373056
author	Heuillet, Maxime Ahmad, Ola Durand, Audrey
author_facet	Heuillet, Maxime Ahmad, Ola Durand, Audrey
contents	The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The agent then observes a feedback signal that is only partially informative about the (unobserved) outcome. The agent leverages the received feedback signals to select actions that minimize the (unobserved) cumulative loss. In contextual PM, the outcomes depend on some side information that is observable by the agent before selecting the action on each round. In this paper, we consider the contextual and non-contextual PM settings with stochastic outcomes. We introduce a new class of PM strategies based on the randomization of deterministic confidence bounds. We also extend regret guarantees to settings where existing stochastic strategies are not applicable. Our experiments show that the proposed RandCBP and RandCBPsidestar strategies have favorable performance against state-of-the-art baselines in multiple PM games. To advocate for the adoption of the PM framework, we design a use case on the real-world problem of monitoring the error rate of any deployed classification system.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_05002
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Randomized Confidence Bounds for Stochastic Partial Monitoring Heuillet, Maxime Ahmad, Ola Durand, Audrey Machine Learning The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The agent then observes a feedback signal that is only partially informative about the (unobserved) outcome. The agent leverages the received feedback signals to select actions that minimize the (unobserved) cumulative loss. In contextual PM, the outcomes depend on some side information that is observable by the agent before selecting the action on each round. In this paper, we consider the contextual and non-contextual PM settings with stochastic outcomes. We introduce a new class of PM strategies based on the randomization of deterministic confidence bounds. We also extend regret guarantees to settings where existing stochastic strategies are not applicable. Our experiments show that the proposed RandCBP and RandCBPsidestar strategies have favorable performance against state-of-the-art baselines in multiple PM games. To advocate for the adoption of the PM framework, we design a use case on the real-world problem of monitoring the error rate of any deployed classification system.
title	Randomized Confidence Bounds for Stochastic Partial Monitoring
topic	Machine Learning
url	https://arxiv.org/abs/2402.05002

Similar Items