Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cuonzo, Simone, Deliu, Nina
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2512.09850
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866908704430882816
author	Cuonzo, Simone Deliu, Nina
author_facet	Cuonzo, Simone Deliu, Nina
contents	We introduce Conformal Bandits, a novel framework integrating Conformal Prediction (CP) into bandit problems, a classic paradigm for sequential decision-making under uncertainty. Traditional regret-minimisation bandit strategies like Thompson Sampling and Upper Confidence Bound (UCB) typically rely on distributional assumptions or asymptotic guarantees; further, they remain largely focused on regret, neglecting their statistical properties. We address this gap. Through the adoption of CP, we bridge the regret-minimising potential of a decision-making bandit policy with statistical guarantees in the form of finite-time prediction coverage. We demonstrate the potential of it Conformal Bandits through simulation studies and an application to portfolio allocation, a typical small-gap regime, where differences in arm rewards are far too small for classical policies to achieve optimal regret bounds in finite sample. Motivated by this, we showcase our framework's practical advantage in terms of regret in small-gap settings, as well as its added value in achieving nominal coverage guarantees where classical UCB policies fail. Focusing on our application of interest, we further illustrate how integrating hidden Markov models to capture the regime-switching behaviour of financial markets, enhances the exploration-exploitation trade-off, and translates into higher risk-adjusted regret efficiency returns, while preserving coverage guarantees.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_09850
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Conformal Bandits: Bringing statistical validity and reward efficiency to the small-gap regime Cuonzo, Simone Deliu, Nina Machine Learning We introduce Conformal Bandits, a novel framework integrating Conformal Prediction (CP) into bandit problems, a classic paradigm for sequential decision-making under uncertainty. Traditional regret-minimisation bandit strategies like Thompson Sampling and Upper Confidence Bound (UCB) typically rely on distributional assumptions or asymptotic guarantees; further, they remain largely focused on regret, neglecting their statistical properties. We address this gap. Through the adoption of CP, we bridge the regret-minimising potential of a decision-making bandit policy with statistical guarantees in the form of finite-time prediction coverage. We demonstrate the potential of it Conformal Bandits through simulation studies and an application to portfolio allocation, a typical small-gap regime, where differences in arm rewards are far too small for classical policies to achieve optimal regret bounds in finite sample. Motivated by this, we showcase our framework's practical advantage in terms of regret in small-gap settings, as well as its added value in achieving nominal coverage guarantees where classical UCB policies fail. Focusing on our application of interest, we further illustrate how integrating hidden Markov models to capture the regime-switching behaviour of financial markets, enhances the exploration-exploitation trade-off, and translates into higher risk-adjusted regret efficiency returns, while preserving coverage guarantees.
title	Conformal Bandits: Bringing statistical validity and reward efficiency to the small-gap regime
topic	Machine Learning
url	https://arxiv.org/abs/2512.09850

Ähnliche Einträge