Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chawla, Ronshee, Sankararaman, Abishek, Ganesh, Ayalvadi, Shakkottai, Sanjay
Format:	Preprint
Published:	2020
Subjects:	Machine Learning Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Social and Information Networks
Online Access:	https://arxiv.org/abs/2001.05452
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909239714250752
author	Chawla, Ronshee Sankararaman, Abishek Ganesh, Ayalvadi Shakkottai, Sanjay
author_facet	Chawla, Ronshee Sankararaman, Abishek Ganesh, Ayalvadi Shakkottai, Sanjay
contents	We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate $Ω(\log(T))$ times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order $N$ smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results thus demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.
format	Preprint
id	arxiv_https___arxiv_org_abs_2001_05452
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits Chawla, Ronshee Sankararaman, Abishek Ganesh, Ayalvadi Shakkottai, Sanjay Machine Learning Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Social and Information Networks We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate $Ω(\log(T))$ times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order $N$ smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results thus demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.
title	The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits
topic	Machine Learning Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Social and Information Networks
url	https://arxiv.org/abs/2001.05452

Similar Items