Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sinha, Amit, Geist, Matthieu, Mahajan, Aditya
Format:	Preprint
Published:	2026
Subjects:	Multiagent Systems
Online Access:	https://arxiv.org/abs/2604.09495
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914464209567744
author	Sinha, Amit Geist, Matthieu Mahajan, Aditya
author_facet	Sinha, Amit Geist, Matthieu Mahajan, Aditya
contents	Optimally solving decentralized decision-making problems modeled as Dec-POMDPs is known to be NEXP-complete. These optimal solutions are policies based on the entire history of observations and actions of an agent. However, some applications may require more compact policies because of limited compute capabilities, which can be modeled by considering a limited number of memory states (or agent states). While such an agent-state based policy class may not contain the optimal solution, it is still of practical interest to find the best agent-state policy within the class. We focus on an iterated best response style algorithm which guarantees monotonic improvements and convergence to a local optimum in polynomial runtime in the Dec-POMDP model size. In order to obtain a better local optimum, we use a modified objective which incentivizes risk-seeking alongside a conservative policy iteration update. Our empirical results show that our approach performs as well as state-of-the-art approaches on several benchmark Dec-POMDPs, achieving near-optimal performance while having polynomial runtime despite the limited memory. We also show that using more agent states (a larger memory) leads to greater performance. Our approach provides a novel way of incorporating memory constraints on the agents in the Dec-POMDP problem.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_09495
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Risk-seeking conservative policy iteration with agent-state based policies for Dec-POMDPs with guaranteed convergence Sinha, Amit Geist, Matthieu Mahajan, Aditya Multiagent Systems Optimally solving decentralized decision-making problems modeled as Dec-POMDPs is known to be NEXP-complete. These optimal solutions are policies based on the entire history of observations and actions of an agent. However, some applications may require more compact policies because of limited compute capabilities, which can be modeled by considering a limited number of memory states (or agent states). While such an agent-state based policy class may not contain the optimal solution, it is still of practical interest to find the best agent-state policy within the class. We focus on an iterated best response style algorithm which guarantees monotonic improvements and convergence to a local optimum in polynomial runtime in the Dec-POMDP model size. In order to obtain a better local optimum, we use a modified objective which incentivizes risk-seeking alongside a conservative policy iteration update. Our empirical results show that our approach performs as well as state-of-the-art approaches on several benchmark Dec-POMDPs, achieving near-optimal performance while having polynomial runtime despite the limited memory. We also show that using more agent states (a larger memory) leads to greater performance. Our approach provides a novel way of incorporating memory constraints on the agents in the Dec-POMDP problem.
title	Risk-seeking conservative policy iteration with agent-state based policies for Dec-POMDPs with guaranteed convergence
topic	Multiagent Systems
url	https://arxiv.org/abs/2604.09495

Similar Items