Saved in:
Bibliographic Details
Main Authors: Liu, Jialei, Koksal, C. Emre, Shi, Ming
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.18936
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914341179097088
author Liu, Jialei
Koksal, C. Emre
Shi, Ming
author_facet Liu, Jialei
Koksal, C. Emre
Shi, Ming
contents We study a bi-level online provisioning and scheduling problem motivated by network resource allocation, where provisioning decisions are made at a slow time scale while queue-/state-dependent scheduling is performed at a fast time scale. We model this two-time-scale interaction using an upper-level online convex optimization (OCO) problem and a lower-level constrained Markov decision process (CMDP). Existing OCO typically assumes stateless decisions and thus cannot capture MDP network dynamics such as queue evolution. Meanwhile, CMDP algorithms typically assume a fixed constraint threshold, whereas in provisioning-and-scheduling systems, the threshold varies with online budget decisions. To address these gaps, we study bi-level OCO-CMDP learning under switching costs (budget reprovisioning/system reconfiguration) and cross-level constraints that couple budgets to scheduling decisions. Our new algorithm solves this learning problem via several non-trivial developments, including a carefully designed dual feedback that returns the budget multiplier as sensitivity information for the upper-level update and a lower level that solves a budget-adaptive safe exploration problem via an extended occupancy-measure linear program. We establish near-optimal regret and high-probability satisfaction of the cross-level constraints.
format Preprint
id arxiv_https___arxiv_org_abs_2601_18936
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Bi-Level Online Provisioning and Scheduling with Switching Costs and Cross-Level Constraints
Liu, Jialei
Koksal, C. Emre
Shi, Ming
Machine Learning
We study a bi-level online provisioning and scheduling problem motivated by network resource allocation, where provisioning decisions are made at a slow time scale while queue-/state-dependent scheduling is performed at a fast time scale. We model this two-time-scale interaction using an upper-level online convex optimization (OCO) problem and a lower-level constrained Markov decision process (CMDP). Existing OCO typically assumes stateless decisions and thus cannot capture MDP network dynamics such as queue evolution. Meanwhile, CMDP algorithms typically assume a fixed constraint threshold, whereas in provisioning-and-scheduling systems, the threshold varies with online budget decisions. To address these gaps, we study bi-level OCO-CMDP learning under switching costs (budget reprovisioning/system reconfiguration) and cross-level constraints that couple budgets to scheduling decisions. Our new algorithm solves this learning problem via several non-trivial developments, including a carefully designed dual feedback that returns the budget multiplier as sensitivity information for the upper-level update and a lower level that solves a budget-adaptive safe exploration problem via an extended occupancy-measure linear program. We establish near-optimal regret and high-probability satisfaction of the cross-level constraints.
title Bi-Level Online Provisioning and Scheduling with Switching Costs and Cross-Level Constraints
topic Machine Learning
url https://arxiv.org/abs/2601.18936