Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xianzhi, Xu, Yue, Zhu, Yinlin, Wu, Di, Zhou, Yipeng, Hu, Miao, Quan, Guocong
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.06403
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917318672515072
author	Zhang, Xianzhi Xu, Yue Zhu, Yinlin Wu, Di Zhou, Yipeng Hu, Miao Quan, Guocong
author_facet	Zhang, Xianzhi Xu, Yue Zhu, Yinlin Wu, Di Zhou, Yipeng Hu, Miao Quan, Guocong
contents	Multi-modal large language model (MLLM) inference scheduling enables strong response quality under practical and heterogeneous budgets, beyond what a homogeneous single-backend setting can offer. Yet online MLLM task scheduling is nontrivial, as requests vary sharply in modality composition and latent reasoning difficulty, while execution backends incur distinct, time-varying costs due to system jitter and network variation. These coupled uncertainties pose two core challenges: deriving semantically faithful yet scheduling-relevant multi-modal task representations, and making low-overhead online decisions over irreversible multi-dimensional budgets. Accordingly, we propose \emph{M-CMAB} (\underline{M}ulti-modal \underline{M}ulti-constraint \underline{C}ontextual \underline{M}ulti-\underline{A}rmed \underline{B}andit), a multi-adapter-enhanced MLLM inference scheduling framework with three components: (i) a CLS-attentive, frozen-backbone \emph{Predictor} that extracts compact task representations and updates only lightweight adapters for action-specific estimation; (ii) a primal-dual \emph{Constrainer} that maintains online Lagrange multipliers to enforce long-horizon constraints via per-round objectives; and (iii) a two-phase \emph{Scheduler} that balances exploration and exploitation under irreversible budgets. We establish a regret guarantee under multi-dimensional knapsack constraints. On a composite multimodal benchmark with heterogeneous backends, \emph{M-CMAB} consistently outperforms state-of-the-art baselines across budget regimes, achieving up to 14.18% higher reward and closely tracking an oracle-aided upper bound. Codes are available at https://anonymous.4open.science/r/M2CMAB/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_06403
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling Zhang, Xianzhi Xu, Yue Zhu, Yinlin Wu, Di Zhou, Yipeng Hu, Miao Quan, Guocong Machine Learning Multi-modal large language model (MLLM) inference scheduling enables strong response quality under practical and heterogeneous budgets, beyond what a homogeneous single-backend setting can offer. Yet online MLLM task scheduling is nontrivial, as requests vary sharply in modality composition and latent reasoning difficulty, while execution backends incur distinct, time-varying costs due to system jitter and network variation. These coupled uncertainties pose two core challenges: deriving semantically faithful yet scheduling-relevant multi-modal task representations, and making low-overhead online decisions over irreversible multi-dimensional budgets. Accordingly, we propose \emph{M-CMAB} (\underline{M}ulti-modal \underline{M}ulti-constraint \underline{C}ontextual \underline{M}ulti-\underline{A}rmed \underline{B}andit), a multi-adapter-enhanced MLLM inference scheduling framework with three components: (i) a CLS-attentive, frozen-backbone \emph{Predictor} that extracts compact task representations and updates only lightweight adapters for action-specific estimation; (ii) a primal-dual \emph{Constrainer} that maintains online Lagrange multipliers to enforce long-horizon constraints via per-round objectives; and (iii) a two-phase \emph{Scheduler} that balances exploration and exploitation under irreversible budgets. We establish a regret guarantee under multi-dimensional knapsack constraints. On a composite multimodal benchmark with heterogeneous backends, \emph{M-CMAB} consistently outperforms state-of-the-art baselines across budget regimes, achieving up to 14.18% higher reward and closely tracking an oracle-aided upper bound. Codes are available at https://anonymous.4open.science/r/M2CMAB/.
title	Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling
topic	Machine Learning
url	https://arxiv.org/abs/2603.06403

Similar Items