Enregistré dans:
Détails bibliographiques
Auteurs principaux: Sengupta, Sikata, Liu, Guangyi, Gottesman, Omer, Durham, Joseph W, Kearns, Michael, Roth, Aaron, Caldara, Michael
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2602.24182
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866912931013197824
author Sengupta, Sikata
Liu, Guangyi
Gottesman, Omer
Durham, Joseph W
Kearns, Michael
Roth, Aaron
Caldara, Michael
author_facet Sengupta, Sikata
Liu, Guangyi
Gottesman, Omer
Durham, Joseph W
Kearns, Michael
Roth, Aaron
Caldara, Michael
contents Optimizing the consolidation process in container-based fulfillment centers requires trading off competing objectives such as processing speed, resource usage, and space utilization while adhering to a range of real-world operational constraints. This process involves moving items between containers via a combination of human and robotic workstations to free up space for inbound inventory and increase container utilization. We formulate this problem as a large-scale Multi-Objective Reinforcement Learning (MORL) task with high-dimensional state spaces and dynamic system behavior. Our method builds on recent theoretical advances in solving constrained RL problems via best-response and no-regret dynamics in zero-sum games, enabling principled minimax policy learning. Policy evaluation on realistic warehouse simulations shows that our approach effectively trades off objectives, and we empirically observe that it learns a single policy that simultaneously satisfies all constraints, even if this is not theoretically guaranteed. We further introduce a theoretical framework to handle the problem of error cancellation, where time-averaged solutions display oscillatory behavior. This method returns a single iterate whose Lagrangian value is close to the minimax value of the game. These results demonstrate the promise of MORL in solving complex, high-impact decision-making problems in large-scale industrial systems.
format Preprint
id arxiv_https___arxiv_org_abs_2602_24182
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Multi-Objective Reinforcement Learning for Large-Scale Tote Allocation in Human-Robot Collaborative Fulfillment Centers
Sengupta, Sikata
Liu, Guangyi
Gottesman, Omer
Durham, Joseph W
Kearns, Michael
Roth, Aaron
Caldara, Michael
Machine Learning
Optimizing the consolidation process in container-based fulfillment centers requires trading off competing objectives such as processing speed, resource usage, and space utilization while adhering to a range of real-world operational constraints. This process involves moving items between containers via a combination of human and robotic workstations to free up space for inbound inventory and increase container utilization. We formulate this problem as a large-scale Multi-Objective Reinforcement Learning (MORL) task with high-dimensional state spaces and dynamic system behavior. Our method builds on recent theoretical advances in solving constrained RL problems via best-response and no-regret dynamics in zero-sum games, enabling principled minimax policy learning. Policy evaluation on realistic warehouse simulations shows that our approach effectively trades off objectives, and we empirically observe that it learns a single policy that simultaneously satisfies all constraints, even if this is not theoretically guaranteed. We further introduce a theoretical framework to handle the problem of error cancellation, where time-averaged solutions display oscillatory behavior. This method returns a single iterate whose Lagrangian value is close to the minimax value of the game. These results demonstrate the promise of MORL in solving complex, high-impact decision-making problems in large-scale industrial systems.
title Multi-Objective Reinforcement Learning for Large-Scale Tote Allocation in Human-Robot Collaborative Fulfillment Centers
topic Machine Learning
url https://arxiv.org/abs/2602.24182