Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Corrêa, Arthur, Jesus, Alexandre, Nascimento, Paulo, Silva, Cristóvão, Moniz, Samuel
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.03323
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918220327288832
author	Corrêa, Arthur Jesus, Alexandre Nascimento, Paulo Silva, Cristóvão Moniz, Samuel
author_facet	Corrêa, Arthur Jesus, Alexandre Nascimento, Paulo Silva, Cristóvão Moniz, Samuel
contents	In this work, we conduct an extensive empirical study of several deep reinforcement learning algorithms on two challenging combinatorial optimization problems: the job-shop and flexible job-shop scheduling problems, both fundamental challenges with multiple industrial applications. Broadly, deep reinforcement learning algorithms fall into two categories: policy-gradient and value-based. While value-based algorithms have achieved notable success in domains such as the Arcade Learning Environment, the combinatorial optimization community has predominantly favored policy-gradient algorithms, often overlooking the potential of value-based alternatives. From our results, value-based algorithms demonstrated a lower variance and a more stable convergence profile compared to policy-gradient ones. Moreover, they achieved superior cross-size and cross-distribution generalization, that is, effectively solving instances that are substantially larger or structurally distinct from those seen during training. Finally, our analysis also suggests that the relative performance of each category of algorithms may be dependent on structural properties of the problem, such as problem flexibility and instance size. Overall, our findings challenge the prevailing assumption that policy-gradient algorithms are inherently superior for combinatorial optimization. We show instead that value-based algorithms can match or even surpass the performance of policy-gradient algorithms, suggesting that they deserve greater attention from the combinatorial optimization community. Our code is openly available at: https://github.com/AJ-Correa/Unraveling-the-Rainbow
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_03323
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Unraveling the Rainbow: can value-based methods schedule? Corrêa, Arthur Jesus, Alexandre Nascimento, Paulo Silva, Cristóvão Moniz, Samuel Machine Learning In this work, we conduct an extensive empirical study of several deep reinforcement learning algorithms on two challenging combinatorial optimization problems: the job-shop and flexible job-shop scheduling problems, both fundamental challenges with multiple industrial applications. Broadly, deep reinforcement learning algorithms fall into two categories: policy-gradient and value-based. While value-based algorithms have achieved notable success in domains such as the Arcade Learning Environment, the combinatorial optimization community has predominantly favored policy-gradient algorithms, often overlooking the potential of value-based alternatives. From our results, value-based algorithms demonstrated a lower variance and a more stable convergence profile compared to policy-gradient ones. Moreover, they achieved superior cross-size and cross-distribution generalization, that is, effectively solving instances that are substantially larger or structurally distinct from those seen during training. Finally, our analysis also suggests that the relative performance of each category of algorithms may be dependent on structural properties of the problem, such as problem flexibility and instance size. Overall, our findings challenge the prevailing assumption that policy-gradient algorithms are inherently superior for combinatorial optimization. We show instead that value-based algorithms can match or even surpass the performance of policy-gradient algorithms, suggesting that they deserve greater attention from the combinatorial optimization community. Our code is openly available at: https://github.com/AJ-Correa/Unraveling-the-Rainbow
title	Unraveling the Rainbow: can value-based methods schedule?
topic	Machine Learning
url	https://arxiv.org/abs/2505.03323

Similar Items