Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Ziqun, Cai, Kechao, Chen, Zhuoyue, Zhang, Jinbei, Lui, John C. S.
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2407.15439
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910545359142912
author	Chen, Ziqun Cai, Kechao Chen, Zhuoyue Zhang, Jinbei Lui, John C. S.
author_facet	Chen, Ziqun Cai, Kechao Chen, Zhuoyue Zhang, Jinbei Lui, John C. S.
contents	We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_15439
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays Chen, Ziqun Cai, Kechao Chen, Zhuoyue Zhang, Jinbei Lui, John C. S. Machine Learning We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.
title	Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays
topic	Machine Learning
url	https://arxiv.org/abs/2407.15439

Similar Items