Salvato in:
Dettagli Bibliografici
Autori principali: Tang, Yuhan, Cui, Kangxin, Park, Jung Ho, Zhao, Yibo, Jiang, Xuan, He, Haoze, Yu, Jiangbo, Koutsopoulos, Haris, Zhao, Jinhua
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2512.13727
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866909012314816512
author Tang, Yuhan
Cui, Kangxin
Park, Jung Ho
Zhao, Yibo
Jiang, Xuan
He, Haoze
Yu, Jiangbo
Koutsopoulos, Haris
Zhao, Jinhua
author_facet Tang, Yuhan
Cui, Kangxin
Park, Jung Ho
Zhao, Yibo
Jiang, Xuan
He, Haoze
Yu, Jiangbo
Koutsopoulos, Haris
Zhao, Jinhua
contents Ride-hailing platforms face the challenge of balancing passenger waiting times with overall system efficiency under highly uncertain supply-demand conditions. Adaptive delayed matching, which controls the holding intervals for batched sets of requests and vehicles, reveals an inherent trade-off between matching and pickup delays. The resulting environment with temporally varying request arrival patterns and dynamic congestion calls for more expressive networks with sufficient capacity to capture their non-stationarity. To address the limitations of existing methods that rely on shallow encoders that cannot capture dynamic supply-demand patterns and congestion effects, we introduce the Regime-Aware Spatio-Temporal Mixture-of-Experts (RAST-MoE) framework, which formalizes adaptive delayed matching as a regime-aware Markov Decision Process and equips RL agents with a self-attention MoE encoder. Instead of relying on a single monolithic network, our design allows different experts to specialize automatically in varying operational conditions, improving representation capacity while maintaining per-sample computation efficiency. Despite its modest size of only 12M parameters, our framework consistently outperforms strong baselines. On real-world Uber trajectory data from San Francisco, it reduces average matching delay by 10%, and pickup delay by 15%. In addition, it demonstrates robustness to unseen demand regimes, stable training behavior without reward hacking, and expert specialization to different regimes. This study shows the strength of MoE-enhanced RL for large-scale decision-making tasks with complex spatiotemporal dynamics.
format Preprint
id arxiv_https___arxiv_org_abs_2512_13727
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing
Tang, Yuhan
Cui, Kangxin
Park, Jung Ho
Zhao, Yibo
Jiang, Xuan
He, Haoze
Yu, Jiangbo
Koutsopoulos, Haris
Zhao, Jinhua
Machine Learning
Ride-hailing platforms face the challenge of balancing passenger waiting times with overall system efficiency under highly uncertain supply-demand conditions. Adaptive delayed matching, which controls the holding intervals for batched sets of requests and vehicles, reveals an inherent trade-off between matching and pickup delays. The resulting environment with temporally varying request arrival patterns and dynamic congestion calls for more expressive networks with sufficient capacity to capture their non-stationarity. To address the limitations of existing methods that rely on shallow encoders that cannot capture dynamic supply-demand patterns and congestion effects, we introduce the Regime-Aware Spatio-Temporal Mixture-of-Experts (RAST-MoE) framework, which formalizes adaptive delayed matching as a regime-aware Markov Decision Process and equips RL agents with a self-attention MoE encoder. Instead of relying on a single monolithic network, our design allows different experts to specialize automatically in varying operational conditions, improving representation capacity while maintaining per-sample computation efficiency. Despite its modest size of only 12M parameters, our framework consistently outperforms strong baselines. On real-world Uber trajectory data from San Francisco, it reduces average matching delay by 10%, and pickup delay by 15%. In addition, it demonstrates robustness to unseen demand regimes, stable training behavior without reward hacking, and expert specialization to different regimes. This study shows the strength of MoE-enhanced RL for large-scale decision-making tasks with complex spatiotemporal dynamics.
title RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing
topic Machine Learning
url https://arxiv.org/abs/2512.13727