Saved in:
Bibliographic Details
Main Author: Buzzard, Zak
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.09521
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916650928832512
author Buzzard, Zak
author_facet Buzzard, Zak
contents Extending deep Q-learning to cooperative multi-agent settings is challenging due to the exponential growth of the joint action space, the non-stationary environment, and the credit assignment problem. Value decomposition allows deep Q-learning to be applied at the joint agent level, at the cost of reduced expressivity. Building on past work in this direction, our paper proposes PairVDN, a novel method for decomposing the value function into a collection of pair-wise, rather than per-agent, functions, improving expressivity at the cost of requiring a more complex (but still efficient) dynamic programming maximisation algorithm. Our method enables the representation of value functions which cannot be expressed as a monotonic combination of per-agent functions, unlike past approaches such as VDN and QMIX. We implement a novel many-agent cooperative environment, Box Jump, and demonstrate improved performance over these baselines in this setting. We open-source our code and environment at https://github.com/zzbuzzard/PairVDN.
format Preprint
id arxiv_https___arxiv_org_abs_2503_09521
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PairVDN - Pair-wise Decomposed Value Functions
Buzzard, Zak
Artificial Intelligence
Extending deep Q-learning to cooperative multi-agent settings is challenging due to the exponential growth of the joint action space, the non-stationary environment, and the credit assignment problem. Value decomposition allows deep Q-learning to be applied at the joint agent level, at the cost of reduced expressivity. Building on past work in this direction, our paper proposes PairVDN, a novel method for decomposing the value function into a collection of pair-wise, rather than per-agent, functions, improving expressivity at the cost of requiring a more complex (but still efficient) dynamic programming maximisation algorithm. Our method enables the representation of value functions which cannot be expressed as a monotonic combination of per-agent functions, unlike past approaches such as VDN and QMIX. We implement a novel many-agent cooperative environment, Box Jump, and demonstrate improved performance over these baselines in this setting. We open-source our code and environment at https://github.com/zzbuzzard/PairVDN.
title PairVDN - Pair-wise Decomposed Value Functions
topic Artificial Intelligence
url https://arxiv.org/abs/2503.09521