Saved in:
Bibliographic Details
Main Authors: Li, Yao-Hui, Wang, Zeyu, Li, Xin, Pang, Wei, Yuan, Yingfang, Chen, Zhengkun, Zhang, Boya, Islam, Riashat, Lamb, Alex, Zhang, Yonggang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.03201
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Model-based reinforcement learning (MBRL) is sample-efficient but struggles in sparse reward settings. A critical bottleneck arises from the lack of informative gradients in sparse settings, where standard reward models often yield flat landscapes that struggle to guide planning. To address this challenge, we propose Shaping Landscapes with Optimistic Potential Estimates (SLOPE), a novel framework that shifts reward modeling from predicting sparse scalars to constructing informative potential landscapes. SLOPE employs optimistic distributional regression to estimate high-confidence upper bounds, which amplifies rare success signals and ensures sufficient exploration gradients. Evaluations on 30+ tasks across 5 benchmarks and real-world robotic deployments, demonstrate that SLOPE consistently outperforms leading baselines in fully sparse, semi-sparse, and dense rewards.