Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Russell, Benedict, Leung, Chin-wing, Turrini, Paolo
Format:	Preprint
Published:	2026
Subjects:	Multiagent Systems
Online Access:	https://arxiv.org/abs/2605.18185
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916022834954240
author	Russell, Benedict Leung, Chin-wing Turrini, Paolo
author_facet	Russell, Benedict Leung, Chin-wing Turrini, Paolo
contents	In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_18185
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection Russell, Benedict Leung, Chin-wing Turrini, Paolo Multiagent Systems In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.
title	The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection
topic	Multiagent Systems
url	https://arxiv.org/abs/2605.18185

Similar Items