:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cohen, Lior, Nabati, Ofir, Wang, Kaixin, Kumar, Navdeep, Mannor, Shie
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.08032
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Representation-Driven Reinforcement Learning
by: Nabati, Ofir, et al.
Published: (2023)

Improving Token-Based World Models with Parallel Observation Prediction
by: Cohen, Lior, et al.
Published: (2024)

Simulus: Combining Improvements in Sample-Efficient World Model Agents
by: Cohen, Lior, et al.
Published: (2025)

Spectral Bellman Method: Unifying Representation and Exploration in RL
by: Nabati, Ofir, et al.
Published: (2025)

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel
by: Wang, Kaixin, et al.
Published: (2023)

Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum
by: Kumar, Navdeep, et al.
Published: (2026)

Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead
by: Koren, Uri, et al.
Published: (2025)

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
by: Kumar, Navdeep, et al.
Published: (2024)

On the Convergence of Single-Timescale Actor-Critic
by: Kumar, Navdeep, et al.
Published: (2024)

Efficient Fairness-Performance Pareto Front Computation
by: Kozdoba, Mark, et al.
Published: (2024)

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization
by: Gadot, Uri, et al.
Published: (2023)

MinMaxMin $Q$-learning
by: Soffair, Nitsan, et al.
Published: (2024)

Conservative DDPG -- Pessimistic RL without Ensemble
by: Soffair, Nitsan, et al.
Published: (2024)

Representative Action Selection for Large Action Space: From Bandits to MDPs
by: Zhou, Quan, et al.
Published: (2025)

Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels
by: Perets, Binyamin, et al.
Published: (2026)

Sobolev Space Regularised Pre Density Models
by: Kozdoba, Mark, et al.
Published: (2023)

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes
by: Kumar, Navdeep, et al.
Published: (2025)

Tree Search-Based Policy Optimization under Stochastic Execution Delay
by: Valensi, David, et al.
Published: (2024)

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
by: Du, Yihan, et al.
Published: (2024)

The Value of Mechanistic Priors in Sequential Decision Making
by: Shufaro, Itai, et al.
Published: (2026)

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
by: Kwon, Jeongyeol, et al.
Published: (2024)

Policy Gradient with Tree Expansion
by: Dalal, Gal, et al.
Published: (2023)

Representative Action Selection for Large Action Space Bandit Families
by: Zhou, Quan, et al.
Published: (2025)

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
by: Vainshtein, Ron, et al.
Published: (2025)

On Bits and Bandits: Quantifying the Regret-Information Trade-off
by: Shufaro, Itai, et al.
Published: (2024)

A Classification View on Meta Learning Bandits
by: Mutti, Mirco, et al.
Published: (2025)

SQT -- std $Q$-target
by: Soffair, Nitsan, et al.
Published: (2024)

Reinforcement Learning with Segment Feedback
by: Du, Yihan, et al.
Published: (2025)

DiffusionRollout: Uncertainty-Aware Rollout Planning in Long-Horizon PDE Solving
by: Yoo, Seungwoo, et al.
Published: (2026)

ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts
by: Pang, Jing-Cheng, et al.
Published: (2025)

VLM-Guided Experience Replay
by: Sharony, Elad, et al.
Published: (2026)

RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
by: Gadot, Uri, et al.
Published: (2025)

State Entropy Regularization for Robust Reinforcement Learning
by: Ashlag, Yonatan, et al.
Published: (2025)

Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management
by: Avin, Chen, et al.
Published: (2025)

DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning
by: Zhao, Hanye, et al.
Published: (2024)

Learning Multiple Initial Solutions to Optimization Problems
by: Sharony, Elad, et al.
Published: (2024)

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
by: Zhai, Yuanzhao, et al.
Published: (2024)

SGNO: Spectral Generator Neural Operators for Stable Long Horizon PDE Rollouts
by: Li, Jiayi, et al.
Published: (2026)

Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs
by: Fuhrer, Benjamin, et al.
Published: (2022)

Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning
by: Ding, Zihan, et al.
Published: (2024)