Saved in:
| Main Authors: | Cohen, Lior, Nabati, Ofir, Wang, Kaixin, Kumar, Navdeep, Mannor, Shie |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08032 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Representation-Driven Reinforcement Learning
by: Nabati, Ofir, et al.
Published: (2023)
by: Nabati, Ofir, et al.
Published: (2023)
Improving Token-Based World Models with Parallel Observation Prediction
by: Cohen, Lior, et al.
Published: (2024)
by: Cohen, Lior, et al.
Published: (2024)
Simulus: Combining Improvements in Sample-Efficient World Model Agents
by: Cohen, Lior, et al.
Published: (2025)
by: Cohen, Lior, et al.
Published: (2025)
Spectral Bellman Method: Unifying Representation and Exploration in RL
by: Nabati, Ofir, et al.
Published: (2025)
by: Nabati, Ofir, et al.
Published: (2025)
Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel
by: Wang, Kaixin, et al.
Published: (2023)
by: Wang, Kaixin, et al.
Published: (2023)
Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum
by: Kumar, Navdeep, et al.
Published: (2026)
by: Kumar, Navdeep, et al.
Published: (2026)
Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead
by: Koren, Uri, et al.
Published: (2025)
by: Koren, Uri, et al.
Published: (2025)
On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
by: Kumar, Navdeep, et al.
Published: (2024)
by: Kumar, Navdeep, et al.
Published: (2024)
On the Convergence of Single-Timescale Actor-Critic
by: Kumar, Navdeep, et al.
Published: (2024)
by: Kumar, Navdeep, et al.
Published: (2024)
Efficient Fairness-Performance Pareto Front Computation
by: Kozdoba, Mark, et al.
Published: (2024)
by: Kozdoba, Mark, et al.
Published: (2024)
Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization
by: Gadot, Uri, et al.
Published: (2023)
by: Gadot, Uri, et al.
Published: (2023)
MinMaxMin $Q$-learning
by: Soffair, Nitsan, et al.
Published: (2024)
by: Soffair, Nitsan, et al.
Published: (2024)
Conservative DDPG -- Pessimistic RL without Ensemble
by: Soffair, Nitsan, et al.
Published: (2024)
by: Soffair, Nitsan, et al.
Published: (2024)
Representative Action Selection for Large Action Space: From Bandits to MDPs
by: Zhou, Quan, et al.
Published: (2025)
by: Zhou, Quan, et al.
Published: (2025)
Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels
by: Perets, Binyamin, et al.
Published: (2026)
by: Perets, Binyamin, et al.
Published: (2026)
Sobolev Space Regularised Pre Density Models
by: Kozdoba, Mark, et al.
Published: (2023)
by: Kozdoba, Mark, et al.
Published: (2023)
Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes
by: Kumar, Navdeep, et al.
Published: (2025)
by: Kumar, Navdeep, et al.
Published: (2025)
Tree Search-Based Policy Optimization under Stochastic Execution Delay
by: Valensi, David, et al.
Published: (2024)
by: Valensi, David, et al.
Published: (2024)
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
by: Du, Yihan, et al.
Published: (2024)
by: Du, Yihan, et al.
Published: (2024)
The Value of Mechanistic Priors in Sequential Decision Making
by: Shufaro, Itai, et al.
Published: (2026)
by: Shufaro, Itai, et al.
Published: (2026)
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
by: Kwon, Jeongyeol, et al.
Published: (2024)
by: Kwon, Jeongyeol, et al.
Published: (2024)
Policy Gradient with Tree Expansion
by: Dalal, Gal, et al.
Published: (2023)
by: Dalal, Gal, et al.
Published: (2023)
Representative Action Selection for Large Action Space Bandit Families
by: Zhou, Quan, et al.
Published: (2025)
by: Zhou, Quan, et al.
Published: (2025)
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
by: Vainshtein, Ron, et al.
Published: (2025)
by: Vainshtein, Ron, et al.
Published: (2025)
On Bits and Bandits: Quantifying the Regret-Information Trade-off
by: Shufaro, Itai, et al.
Published: (2024)
by: Shufaro, Itai, et al.
Published: (2024)
A Classification View on Meta Learning Bandits
by: Mutti, Mirco, et al.
Published: (2025)
by: Mutti, Mirco, et al.
Published: (2025)
SQT -- std $Q$-target
by: Soffair, Nitsan, et al.
Published: (2024)
by: Soffair, Nitsan, et al.
Published: (2024)
Reinforcement Learning with Segment Feedback
by: Du, Yihan, et al.
Published: (2025)
by: Du, Yihan, et al.
Published: (2025)
DiffusionRollout: Uncertainty-Aware Rollout Planning in Long-Horizon PDE Solving
by: Yoo, Seungwoo, et al.
Published: (2026)
by: Yoo, Seungwoo, et al.
Published: (2026)
ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts
by: Pang, Jing-Cheng, et al.
Published: (2025)
by: Pang, Jing-Cheng, et al.
Published: (2025)
VLM-Guided Experience Replay
by: Sharony, Elad, et al.
Published: (2026)
by: Sharony, Elad, et al.
Published: (2026)
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
by: Gadot, Uri, et al.
Published: (2025)
by: Gadot, Uri, et al.
Published: (2025)
State Entropy Regularization for Robust Reinforcement Learning
by: Ashlag, Yonatan, et al.
Published: (2025)
by: Ashlag, Yonatan, et al.
Published: (2025)
Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management
by: Avin, Chen, et al.
Published: (2025)
by: Avin, Chen, et al.
Published: (2025)
DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning
by: Zhao, Hanye, et al.
Published: (2024)
by: Zhao, Hanye, et al.
Published: (2024)
Learning Multiple Initial Solutions to Optimization Problems
by: Sharony, Elad, et al.
Published: (2024)
by: Sharony, Elad, et al.
Published: (2024)
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
by: Zhai, Yuanzhao, et al.
Published: (2024)
by: Zhai, Yuanzhao, et al.
Published: (2024)
SGNO: Spectral Generator Neural Operators for Stable Long Horizon PDE Rollouts
by: Li, Jiayi, et al.
Published: (2026)
by: Li, Jiayi, et al.
Published: (2026)
Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs
by: Fuhrer, Benjamin, et al.
Published: (2022)
by: Fuhrer, Benjamin, et al.
Published: (2022)
Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning
by: Ding, Zihan, et al.
Published: (2024)
by: Ding, Zihan, et al.
Published: (2024)
Similar Items
-
Representation-Driven Reinforcement Learning
by: Nabati, Ofir, et al.
Published: (2023) -
Improving Token-Based World Models with Parallel Observation Prediction
by: Cohen, Lior, et al.
Published: (2024) -
Simulus: Combining Improvements in Sample-Efficient World Model Agents
by: Cohen, Lior, et al.
Published: (2025) -
Spectral Bellman Method: Unifying Representation and Exploration in RL
by: Nabati, Ofir, et al.
Published: (2025) -
Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel
by: Wang, Kaixin, et al.
Published: (2023)