Saved in:
Bibliographic Details
Main Authors: Chen, Ziqun, Cai, Kechao, Chen, Zhuoyue, Zhang, Jinbei, Lui, John C. S.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.15439
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910545359142912
author Chen, Ziqun
Cai, Kechao
Chen, Zhuoyue
Zhang, Jinbei
Lui, John C. S.
author_facet Chen, Ziqun
Cai, Kechao
Chen, Zhuoyue
Zhang, Jinbei
Lui, John C. S.
contents We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.
format Preprint
id arxiv_https___arxiv_org_abs_2407_15439
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays
Chen, Ziqun
Cai, Kechao
Chen, Zhuoyue
Zhang, Jinbei
Lui, John C. S.
Machine Learning
We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.
title Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays
topic Machine Learning
url https://arxiv.org/abs/2407.15439