Saved in:
| Main Authors: | Saito, Yuta, Nomura, Masahiro |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.15084 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
by: Kiyohara, Haruka, et al.
Published: (2024)
by: Kiyohara, Haruka, et al.
Published: (2024)
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by: Shimizu, Tatsuhiro, et al.
Published: (2024)
by: Shimizu, Tatsuhiro, et al.
Published: (2024)
A General Framework for Off-Policy Learning with Partially-Observed Reward
by: Takehi, Rikiya, et al.
Published: (2025)
by: Takehi, Rikiya, et al.
Published: (2025)
Off-Policy Evaluation and Learning for Matching Markets
by: Hayashi, Yudai, et al.
Published: (2025)
by: Hayashi, Yudai, et al.
Published: (2025)
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Long-term Off-Policy Evaluation and Learning
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Off-Policy Evaluation and Learning for Survival Outcomes under Censoring
by: Kubota, Kohsuke, et al.
Published: (2026)
by: Kubota, Kohsuke, et al.
Published: (2026)
Off-Policy Learning with Limited Supply
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching
by: Kishimoto, Ren, et al.
Published: (2026)
by: Kishimoto, Ren, et al.
Published: (2026)
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning
by: Tian, Minghao, et al.
Published: (2026)
by: Tian, Minghao, et al.
Published: (2026)
Sequential Policy Gradient for Adaptive Hyperparameter Optimization
by: Li, Zheng, et al.
Published: (2025)
by: Li, Zheng, et al.
Published: (2025)
Hyperparameter Optimization in Machine Learning
by: Franceschi, Luca, et al.
Published: (2024)
by: Franceschi, Luca, et al.
Published: (2024)
LLMs Can Learn to Reason Via Off-Policy RL
by: Ritter, Daniel, et al.
Published: (2026)
by: Ritter, Daniel, et al.
Published: (2026)
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization
by: Nagler, Thomas, et al.
Published: (2024)
by: Nagler, Thomas, et al.
Published: (2024)
GRPOformer: Advancing Hyperparameter Optimization via Group Relative Policy Optimization
by: Guo, Haoxin, et al.
Published: (2025)
by: Guo, Haoxin, et al.
Published: (2025)
Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation
by: Yokozawa, Riko, et al.
Published: (2025)
by: Yokozawa, Riko, et al.
Published: (2025)
Overtuning in Hyperparameter Optimization
by: Schneider, Lennart, et al.
Published: (2025)
by: Schneider, Lennart, et al.
Published: (2025)
Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch
by: Ferreira, Fabio, et al.
Published: (2026)
by: Ferreira, Fabio, et al.
Published: (2026)
Adaptive Hyperparameter Optimization for Continual Learning Scenarios
by: Semola, Rudy, et al.
Published: (2024)
by: Semola, Rudy, et al.
Published: (2024)
Instance-wise Supervision-level Optimization in Active Learning
by: Matsuo, Shinnosuke, et al.
Published: (2025)
by: Matsuo, Shinnosuke, et al.
Published: (2025)
RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization
by: Xia, Linxuan, et al.
Published: (2026)
by: Xia, Linxuan, et al.
Published: (2026)
Pessimistic Off-Policy Optimization for Learning to Rank
by: Cief, Matej, et al.
Published: (2022)
by: Cief, Matej, et al.
Published: (2022)
Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview
by: Karl, Florian, et al.
Published: (2022)
by: Karl, Florian, et al.
Published: (2022)
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
by: Zheng, Haizhong, et al.
Published: (2025)
by: Zheng, Haizhong, et al.
Published: (2025)
Transductive Off-policy Proximal Policy Optimization
by: Gan, Yaozhong, et al.
Published: (2024)
by: Gan, Yaozhong, et al.
Published: (2024)
Machine Learning for Climate Policy: Understanding Policy Progression in the European Green Deal
by: West, Patricia, et al.
Published: (2025)
by: West, Patricia, et al.
Published: (2025)
A Memetic Algorithm based on Variational Autoencoder for Black-Box Discrete Optimization with Epistasis among Parameters
by: Kato, Aoi, et al.
Published: (2025)
by: Kato, Aoi, et al.
Published: (2025)
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
by: Mroueh, Youssef, et al.
Published: (2025)
by: Mroueh, Youssef, et al.
Published: (2025)
Dynamic Priors in Bayesian Optimization for Hyperparameter Optimization
by: Fehring, Lukas, et al.
Published: (2025)
by: Fehring, Lukas, et al.
Published: (2025)
Causal-Policy Forest for End-to-End Policy Learning
by: Kato, Masahiro
Published: (2025)
by: Kato, Masahiro
Published: (2025)
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning
by: Becktepe, Jannis, et al.
Published: (2024)
by: Becktepe, Jannis, et al.
Published: (2024)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning
by: Kim, Minyoung, et al.
Published: (2024)
by: Kim, Minyoung, et al.
Published: (2024)
Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization
by: Sambharya, Rajiv, et al.
Published: (2024)
by: Sambharya, Rajiv, et al.
Published: (2024)
General Bayesian Policy Learning
by: Kato, Masahiro
Published: (2026)
by: Kato, Masahiro
Published: (2026)
In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization
by: Rakotoarison, Herilalaina, et al.
Published: (2024)
by: Rakotoarison, Herilalaina, et al.
Published: (2024)
Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning
by: Goodall, Alexander W., et al.
Published: (2025)
by: Goodall, Alexander W., et al.
Published: (2025)
Similar Items
-
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
by: Kiyohara, Haruka, et al.
Published: (2024) -
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by: Shimizu, Tatsuhiro, et al.
Published: (2024) -
A General Framework for Off-Policy Learning with Partially-Observed Reward
by: Takehi, Rikiya, et al.
Published: (2025) -
Off-Policy Evaluation and Learning for Matching Markets
by: Hayashi, Yudai, et al.
Published: (2025) -
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
by: Saito, Yuta, et al.
Published: (2024)