Saved in:
| Main Authors: | Kohler, Hector, Akrour, Riad, Preux, Philippe |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.12701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
by: Kohler, Hector, et al.
Published: (2023)
by: Kohler, Hector, et al.
Published: (2023)
Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning
by: Kohler, Hector, et al.
Published: (2024)
by: Kohler, Hector, et al.
Published: (2024)
Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs
by: Kohler, Hector, et al.
Published: (2025)
by: Kohler, Hector, et al.
Published: (2025)
When (and How) to Trust the Expert: Diagnosing Query-Time Expert-Guided Reinforcement Learning
by: Berthelot, Yann, et al.
Published: (2026)
by: Berthelot, Yann, et al.
Published: (2026)
PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning
by: Driss, Brahim, et al.
Published: (2025)
by: Driss, Brahim, et al.
Published: (2025)
Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop
by: Kohler, Hector, et al.
Published: (2024)
by: Kohler, Hector, et al.
Published: (2024)
StaQ it! Growing neural networks for Policy Mirror Descent
by: Shilova, Alena, et al.
Published: (2025)
by: Shilova, Alena, et al.
Published: (2025)
Augmented Bayesian Policy Search
by: Kallel, Mahdi, et al.
Published: (2024)
by: Kallel, Mahdi, et al.
Published: (2024)
IDEQ: an improved diffusion model for the TSP
by: Basson, Mickael, et al.
Published: (2024)
by: Basson, Mickael, et al.
Published: (2024)
AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents
by: Mathieu, Timothée, et al.
Published: (2023)
by: Mathieu, Timothée, et al.
Published: (2023)
Bandits attack function optimization
by: Preux, Philippe, et al.
Published: (2026)
by: Preux, Philippe, et al.
Published: (2026)
End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions
by: Mhammedi, Zakaria, et al.
Published: (2026)
by: Mhammedi, Zakaria, et al.
Published: (2026)
Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States
by: Chen, Yujiao
Published: (2026)
by: Chen, Yujiao
Published: (2026)
Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinity
by: Muni, Aneri, et al.
Published: (2026)
by: Muni, Aneri, et al.
Published: (2026)
Leo Breiman, the Rashomon Effect, and the Occam Dilemma
by: Rudin, Cynthia
Published: (2025)
by: Rudin, Cynthia
Published: (2025)
RFX-Fuse: Breiman and Cutler's Unified ML Engine + Native Explainable Similarity
by: Kuchar, Chris
Published: (2026)
by: Kuchar, Chris
Published: (2026)
Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance
by: van der Linden, Jacobus G. M., et al.
Published: (2024)
by: van der Linden, Jacobus G. M., et al.
Published: (2024)
Robust Multi-Agent Path Finding under Observation Attacks: A Principled Adversarial-Plus-Smoothing Training Recipe
by: Ahmed, Riad
Published: (2026)
by: Ahmed, Riad
Published: (2026)
An Improved Model-Free Decision-Estimation Coefficient with Applications in Adversarial MDPs
by: Liu, Haolin, et al.
Published: (2025)
by: Liu, Haolin, et al.
Published: (2025)
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
by: Zhong, Han, et al.
Published: (2021)
by: Zhong, Han, et al.
Published: (2021)
Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs
by: Wang, Jing, et al.
Published: (2026)
by: Wang, Jing, et al.
Published: (2026)
Predicting Multi-Drug Resistance in Bacterial Isolates Through Performance Comparison and LIME-based Interpretation of Classification Models
by: Wishal, Santanam, et al.
Published: (2026)
by: Wishal, Santanam, et al.
Published: (2026)
Bellman Diffusion Models
by: Schramm, Liam, et al.
Published: (2024)
by: Schramm, Liam, et al.
Published: (2024)
Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain
by: Wang, Shengbo, et al.
Published: (2025)
by: Wang, Shengbo, et al.
Published: (2025)
Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization
by: Gadot, Uri, et al.
Published: (2023)
by: Gadot, Uri, et al.
Published: (2023)
Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization
by: Satheesh, Anirudh, et al.
Published: (2026)
by: Satheesh, Anirudh, et al.
Published: (2026)
Stability and Generalization for Bellman Residuals
by: Kang, Enoch H., et al.
Published: (2025)
by: Kang, Enoch H., et al.
Published: (2025)
Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem
by: Yang, Hyukjun, et al.
Published: (2026)
by: Yang, Hyukjun, et al.
Published: (2026)
Bellman Error Centering
by: Chen, Xingguo, et al.
Published: (2025)
by: Chen, Xingguo, et al.
Published: (2025)
Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints
by: Xu, Tian, et al.
Published: (2026)
by: Xu, Tian, et al.
Published: (2026)
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
by: Bajpai, Divya Jyoti, et al.
Published: (2025)
by: Bajpai, Divya Jyoti, et al.
Published: (2025)
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
by: Schmied, Thomas, et al.
Published: (2025)
by: Schmied, Thomas, et al.
Published: (2025)
Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting
by: Ko, Hojin, et al.
Published: (2026)
by: Ko, Hojin, et al.
Published: (2026)
Accelerating Matrix Diagonalization through Decision Transformers with Epsilon-Greedy Optimization
by: Bhatta, Kshitij, et al.
Published: (2024)
by: Bhatta, Kshitij, et al.
Published: (2024)
Time-Constrained Robust MDPs
by: Zouitine, Adil, et al.
Published: (2024)
by: Zouitine, Adil, et al.
Published: (2024)
Parameterized Projected Bellman Operator
by: Vincent, Théo, et al.
Published: (2023)
by: Vincent, Théo, et al.
Published: (2023)
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning
by: Omura, Motoki, et al.
Published: (2025)
by: Omura, Motoki, et al.
Published: (2025)
Distributional Bellman Operators over Mean Embeddings
by: Wenliang, Li Kevin, et al.
Published: (2023)
by: Wenliang, Li Kevin, et al.
Published: (2023)
ShiQ: Bringing back Bellman to LLMs
by: Clavier, Pierre, et al.
Published: (2025)
by: Clavier, Pierre, et al.
Published: (2025)
Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs
by: Hong, Kihyuk, et al.
Published: (2024)
by: Hong, Kihyuk, et al.
Published: (2024)
Similar Items
-
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
by: Kohler, Hector, et al.
Published: (2023) -
Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning
by: Kohler, Hector, et al.
Published: (2024) -
Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs
by: Kohler, Hector, et al.
Published: (2025) -
When (and How) to Trust the Expert: Diagnosing Query-Time Expert-Guided Reinforcement Learning
by: Berthelot, Yann, et al.
Published: (2026) -
PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning
by: Driss, Brahim, et al.
Published: (2025)