Saved in:
Bibliographic Details
Main Authors: Wang, Yuepeng, Kawano, Ken, Zhou, Yongqi, Fujisawa, Yoshihiko, Weiss, Richard, Wachi, Akifumi, Fujisawa, Katsuki, Chen, Ying, Sadria, Mehrshad, Liu, Xin, Kim, Kyoung-Sook, Hu, Xiao, Gros, Sebastien, Shen, Xun
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2606.01028
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911736366366720
author Wang, Yuepeng
Kawano, Ken
Zhou, Yongqi
Fujisawa, Yoshihiko
Weiss, Richard
Wachi, Akifumi
Fujisawa, Katsuki
Chen, Ying
Sadria, Mehrshad
Liu, Xin
Kim, Kyoung-Sook
Hu, Xiao
Gros, Sebastien
Shen, Xun
author_facet Wang, Yuepeng
Kawano, Ken
Zhou, Yongqi
Fujisawa, Yoshihiko
Weiss, Richard
Wachi, Akifumi
Fujisawa, Katsuki
Chen, Ying
Sadria, Mehrshad
Liu, Xin
Kim, Kyoung-Sook
Hu, Xiao
Gros, Sebastien
Shen, Xun
contents Medical treatment recommendation poses several challenges to reinforcement learning (RL): patient physiology evolves in continuous time, measurements and interventions are performed at irregular intervals, and treatment effects vary substantially across individuals. Existing RL formulations and simulated environments, however, are based on discrete-time MDP or POMDP abstractions with fixed or pre-specified decision intervals. Thus, it remains difficult to evaluate whether RL methods can handle time-interval-dependent disease progression, personalized treatment response, and safety between consecutive measurement points. To address this gap, we introduce MedGym, a benchmark environment for dynamic treatment recommendation. MedGym models longitudinal patient evolution in a continuous-time framework and constructs a configurable medical RL benchmark from clinical data by using Physics-Informed Neural Networks. The resulting benchmark supports both offline and online RL, and enables direct comparison between discrete-time and continuous-time methods under irregular treatment timing and patient-specific dynamics. Besides, MedGym supports evaluation from clinically important perspectives, including personalization, trajectory-level safety, and the performance gap between model-based offline learning and online deployment. By providing a standardized and configurable benchmark for continuous-time dynamic treatment, MedGym aims to facilitate more realistic and informative evaluation of medical RL methods.
format Preprint
id arxiv_https___arxiv_org_abs_2606_01028
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning
Wang, Yuepeng
Kawano, Ken
Zhou, Yongqi
Fujisawa, Yoshihiko
Weiss, Richard
Wachi, Akifumi
Fujisawa, Katsuki
Chen, Ying
Sadria, Mehrshad
Liu, Xin
Kim, Kyoung-Sook
Hu, Xiao
Gros, Sebastien
Shen, Xun
Machine Learning
Medical treatment recommendation poses several challenges to reinforcement learning (RL): patient physiology evolves in continuous time, measurements and interventions are performed at irregular intervals, and treatment effects vary substantially across individuals. Existing RL formulations and simulated environments, however, are based on discrete-time MDP or POMDP abstractions with fixed or pre-specified decision intervals. Thus, it remains difficult to evaluate whether RL methods can handle time-interval-dependent disease progression, personalized treatment response, and safety between consecutive measurement points. To address this gap, we introduce MedGym, a benchmark environment for dynamic treatment recommendation. MedGym models longitudinal patient evolution in a continuous-time framework and constructs a configurable medical RL benchmark from clinical data by using Physics-Informed Neural Networks. The resulting benchmark supports both offline and online RL, and enables direct comparison between discrete-time and continuous-time methods under irregular treatment timing and patient-specific dynamics. Besides, MedGym supports evaluation from clinically important perspectives, including personalization, trajectory-level safety, and the performance gap between model-based offline learning and online deployment. By providing a standardized and configurable benchmark for continuous-time dynamic treatment, MedGym aims to facilitate more realistic and informative evaluation of medical RL methods.
title MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning
topic Machine Learning
url https://arxiv.org/abs/2606.01028