Saved in:
Bibliographic Details
Main Authors: Sheng, Junjie, Wu, Jiehao, Cui, Haochuan, Hu, Yiqiu, Zhou, Wenli, Zhu, Lei, Peng, Qian, Li, Wenhao, Wang, Xiangfeng
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.00537
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Recent advancements in reinforcement learning (RL) have shown promise for optimizing virtual machine scheduling (VMS) in small-scale clusters. The utilization of RL to large-scale cloud computing scenarios remains notably constrained. This paper introduces a scalable RL framework, called Cluster Value Decomposition Reinforcement Learning (CVD-RL), to surmount the scalability hurdles inherent in large-scale VMS. The CVD-RL framework innovatively combines a decomposition operator with a look-ahead operator to adeptly manage representation complexities, while complemented by a Top-$k$ filter operator that refines exploration efficiency. Different from existing approaches limited to clusters of $10$ or fewer physical machines (PMs), CVD-RL extends its applicability to environments encompassing up to $50$ PMs. Furthermore, the CVD-RL framework demonstrates generalization capabilities that surpass contemporary SOTA methodologies across a variety of scenarios in empirical studies. This breakthrough not only showcases the framework's exceptional scalability and performance but also represents a significant leap in the application of RL for VMS within complex, large-scale cloud infrastructures. The code is available at https://anonymous.4open.science/r/marl4sche-D0FE.