Saved in:
Bibliographic Details
Main Authors: Russell, Benedict, Leung, Chin-wing, Turrini, Paolo
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.18185
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916022834954240
author Russell, Benedict
Leung, Chin-wing
Turrini, Paolo
author_facet Russell, Benedict
Leung, Chin-wing
Turrini, Paolo
contents In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.
format Preprint
id arxiv_https___arxiv_org_abs_2605_18185
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection
Russell, Benedict
Leung, Chin-wing
Turrini, Paolo
Multiagent Systems
In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.
title The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection
topic Multiagent Systems
url https://arxiv.org/abs/2605.18185