Saved in:
| Main Authors: | Malato, Federico, Hautamaki, Ville |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.01558 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Online Adaptation for Enhancing Imitation Learning Policies
by: Malato, Federico, et al.
Published: (2024)
by: Malato, Federico, et al.
Published: (2024)
Zero-shot World Models via Search in Memory
by: Malato, Federico, et al.
Published: (2025)
by: Malato, Federico, et al.
Published: (2025)
Zero-shot Imitation Policy via Search in Demonstration Dataset
by: Malato, Federco, et al.
Published: (2024)
by: Malato, Federco, et al.
Published: (2024)
Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning
by: Goodall, Alexander W., et al.
Published: (2025)
by: Goodall, Alexander W., et al.
Published: (2025)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
by: Zhou, Hongyi, et al.
Published: (2025)
by: Zhou, Hongyi, et al.
Published: (2025)
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
by: Lee, Haanvid, et al.
Published: (2024)
by: Lee, Haanvid, et al.
Published: (2024)
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization
by: Palenicek, Daniel, et al.
Published: (2025)
by: Palenicek, Daniel, et al.
Published: (2025)
Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning
by: Zhuang, Yuan, et al.
Published: (2026)
by: Zhuang, Yuan, et al.
Published: (2026)
Automated Off-Policy Estimator Selection via Supervised Learning
by: Felicioni, Nicolò, et al.
Published: (2024)
by: Felicioni, Nicolò, et al.
Published: (2024)
Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning
by: Narang, Adhyyan, et al.
Published: (2024)
by: Narang, Adhyyan, et al.
Published: (2024)
On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning
by: Canonaco, Giuseppe, et al.
Published: (2024)
by: Canonaco, Giuseppe, et al.
Published: (2024)
COSBO: Conservative Offline Simulation-Based Policy Optimization
by: Kargar, Eshagh, et al.
Published: (2024)
by: Kargar, Eshagh, et al.
Published: (2024)
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
by: Kang, Hyungkyu, et al.
Published: (2025)
by: Kang, Hyungkyu, et al.
Published: (2025)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences
by: Singh, Nikhil Kumar, et al.
Published: (2024)
by: Singh, Nikhil Kumar, et al.
Published: (2024)
Zero-Shot Off-Policy Learning
by: Asadulaev, Arip, et al.
Published: (2026)
by: Asadulaev, Arip, et al.
Published: (2026)
Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective
by: Duan, Tianyang, et al.
Published: (2025)
by: Duan, Tianyang, et al.
Published: (2025)
Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction
by: He, Yiting, et al.
Published: (2025)
by: He, Yiting, et al.
Published: (2025)
Off-Policy Correction For Multi-Agent Reinforcement Learning
by: Zawalski, Michał, et al.
Published: (2021)
by: Zawalski, Michał, et al.
Published: (2021)
Regret-Based Defense in Adversarial Reinforcement Learning
by: Belaire, Roman, et al.
Published: (2023)
by: Belaire, Roman, et al.
Published: (2023)
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
by: Zhang, Wenhao, et al.
Published: (2025)
by: Zhang, Wenhao, et al.
Published: (2025)
CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms
by: Yenicesu, Arda Sarp, et al.
Published: (2024)
by: Yenicesu, Arda Sarp, et al.
Published: (2024)
Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement Learning
by: Okudo, Takato, et al.
Published: (2021)
by: Okudo, Takato, et al.
Published: (2021)
A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
by: Patterson, Andrew, et al.
Published: (2021)
by: Patterson, Andrew, et al.
Published: (2021)
MGAS: Multi-Granularity Architecture Search for Trade-Off Between Model Effectiveness and Efficiency
by: Liu, Xiaoyun, et al.
Published: (2023)
by: Liu, Xiaoyun, et al.
Published: (2023)
LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency
by: Liu, Xiao-Yin, et al.
Published: (2024)
by: Liu, Xiao-Yin, et al.
Published: (2024)
When is Offline Policy Selection Sample Efficient for Reinforcement Learning?
by: Liu, Vincent, et al.
Published: (2023)
by: Liu, Vincent, et al.
Published: (2023)
Learning Action Embeddings for Off-Policy Evaluation
by: Cief, Matej, et al.
Published: (2023)
by: Cief, Matej, et al.
Published: (2023)
Shortcut Learning in Binary Classifier Black Boxes: Applications to Voice Anti-Spoofing and Biometrics
by: Sahidullah, Md, et al.
Published: (2026)
by: Sahidullah, Md, et al.
Published: (2026)
MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning
by: Guo, Yihong, et al.
Published: (2025)
by: Guo, Yihong, et al.
Published: (2025)
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
by: Ackermann, Johannes, et al.
Published: (2025)
by: Ackermann, Johannes, et al.
Published: (2025)
Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions
by: Tuononen, Marko, et al.
Published: (2024)
by: Tuononen, Marko, et al.
Published: (2024)
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
by: Weltevrede, Max, et al.
Published: (2025)
by: Weltevrede, Max, et al.
Published: (2025)
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance
by: McClellan, Joshua, et al.
Published: (2024)
by: McClellan, Joshua, et al.
Published: (2024)
Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity
by: Johnson, Emmeran, et al.
Published: (2023)
by: Johnson, Emmeran, et al.
Published: (2023)
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
by: Melo, Luckeciano C., et al.
Published: (2025)
by: Melo, Luckeciano C., et al.
Published: (2025)
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning
by: Moradipari, Ahmadreza, et al.
Published: (2023)
by: Moradipari, Ahmadreza, et al.
Published: (2023)
Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation
by: Nakanishi, Kosuke, et al.
Published: (2025)
by: Nakanishi, Kosuke, et al.
Published: (2025)
Similar Items
-
Online Adaptation for Enhancing Imitation Learning Policies
by: Malato, Federico, et al.
Published: (2024) -
Zero-shot World Models via Search in Memory
by: Malato, Federico, et al.
Published: (2025) -
Zero-shot Imitation Policy via Search in Demonstration Dataset
by: Malato, Federco, et al.
Published: (2024) -
Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning
by: Goodall, Alexander W., et al.
Published: (2025) -
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)