Saved in:
Bibliographic Details
Main Authors: Corrêa, Arthur, Jesus, Alexandre, Nascimento, Paulo, Silva, Cristóvão, Moniz, Samuel
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.03323
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918220327288832
author Corrêa, Arthur
Jesus, Alexandre
Nascimento, Paulo
Silva, Cristóvão
Moniz, Samuel
author_facet Corrêa, Arthur
Jesus, Alexandre
Nascimento, Paulo
Silva, Cristóvão
Moniz, Samuel
contents In this work, we conduct an extensive empirical study of several deep reinforcement learning algorithms on two challenging combinatorial optimization problems: the job-shop and flexible job-shop scheduling problems, both fundamental challenges with multiple industrial applications. Broadly, deep reinforcement learning algorithms fall into two categories: policy-gradient and value-based. While value-based algorithms have achieved notable success in domains such as the Arcade Learning Environment, the combinatorial optimization community has predominantly favored policy-gradient algorithms, often overlooking the potential of value-based alternatives. From our results, value-based algorithms demonstrated a lower variance and a more stable convergence profile compared to policy-gradient ones. Moreover, they achieved superior cross-size and cross-distribution generalization, that is, effectively solving instances that are substantially larger or structurally distinct from those seen during training. Finally, our analysis also suggests that the relative performance of each category of algorithms may be dependent on structural properties of the problem, such as problem flexibility and instance size. Overall, our findings challenge the prevailing assumption that policy-gradient algorithms are inherently superior for combinatorial optimization. We show instead that value-based algorithms can match or even surpass the performance of policy-gradient algorithms, suggesting that they deserve greater attention from the combinatorial optimization community. Our code is openly available at: https://github.com/AJ-Correa/Unraveling-the-Rainbow
format Preprint
id arxiv_https___arxiv_org_abs_2505_03323
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Unraveling the Rainbow: can value-based methods schedule?
Corrêa, Arthur
Jesus, Alexandre
Nascimento, Paulo
Silva, Cristóvão
Moniz, Samuel
Machine Learning
In this work, we conduct an extensive empirical study of several deep reinforcement learning algorithms on two challenging combinatorial optimization problems: the job-shop and flexible job-shop scheduling problems, both fundamental challenges with multiple industrial applications. Broadly, deep reinforcement learning algorithms fall into two categories: policy-gradient and value-based. While value-based algorithms have achieved notable success in domains such as the Arcade Learning Environment, the combinatorial optimization community has predominantly favored policy-gradient algorithms, often overlooking the potential of value-based alternatives. From our results, value-based algorithms demonstrated a lower variance and a more stable convergence profile compared to policy-gradient ones. Moreover, they achieved superior cross-size and cross-distribution generalization, that is, effectively solving instances that are substantially larger or structurally distinct from those seen during training. Finally, our analysis also suggests that the relative performance of each category of algorithms may be dependent on structural properties of the problem, such as problem flexibility and instance size. Overall, our findings challenge the prevailing assumption that policy-gradient algorithms are inherently superior for combinatorial optimization. We show instead that value-based algorithms can match or even surpass the performance of policy-gradient algorithms, suggesting that they deserve greater attention from the combinatorial optimization community. Our code is openly available at: https://github.com/AJ-Correa/Unraveling-the-Rainbow
title Unraveling the Rainbow: can value-based methods schedule?
topic Machine Learning
url https://arxiv.org/abs/2505.03323