Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bendatu, Vanya Priscillia, Lu, Yao
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.24037
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917299705872384
author	Bendatu, Vanya Priscillia Lu, Yao
author_facet	Bendatu, Vanya Priscillia Lu, Yao
contents	Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces new challenges, as history will never tell what would have happened differently. As a result, incorporating scenario-based rewards from rollouts introduces a reward--transition mismatch in temporal-difference learning, destabilizing RL critic training. We analyze this inconsistency and show it leads to a mixed evaluation target. Guided by this analysis, we construct a counterfactual next state using the rollout-implied continuations and augment the critic agent's bootstrap target. Doing so stabilizes the learning and provides a viable bias-variance tradeoff. In out-of-sample evaluations across 31 distinct universes of U.S. equity and ETF portfolios, our method improves Sharpe ratio by up to 76% and reduces maximum drawdown by up to 53% compared with classic and RL-based portfolio rebalancing baselines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_24037
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Portfolio Reinforcement Learning with Scenario-Context Rollout Bendatu, Vanya Priscillia Lu, Yao Artificial Intelligence Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces new challenges, as history will never tell what would have happened differently. As a result, incorporating scenario-based rewards from rollouts introduces a reward--transition mismatch in temporal-difference learning, destabilizing RL critic training. We analyze this inconsistency and show it leads to a mixed evaluation target. Guided by this analysis, we construct a counterfactual next state using the rollout-implied continuations and augment the critic agent's bootstrap target. Doing so stabilizes the learning and provides a viable bias-variance tradeoff. In out-of-sample evaluations across 31 distinct universes of U.S. equity and ETF portfolios, our method improves Sharpe ratio by up to 76% and reduces maximum drawdown by up to 53% compared with classic and RL-based portfolio rebalancing baselines.
title	Portfolio Reinforcement Learning with Scenario-Context Rollout
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.24037

Similar Items