Saved in:
| Main Authors: | Kiyohara, Haruka, Nomura, Masahiro, Saito, Yuta |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.02171 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by: Shimizu, Tatsuhiro, et al.
Published: (2024)
by: Shimizu, Tatsuhiro, et al.
Published: (2024)
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Prompt Optimization with Logged Bandit Data
by: Kiyohara, Haruka, et al.
Published: (2025)
by: Kiyohara, Haruka, et al.
Published: (2025)
Off-Policy Evaluation and Learning for Matching Markets
by: Hayashi, Yudai, et al.
Published: (2025)
by: Hayashi, Yudai, et al.
Published: (2025)
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Long-term Off-Policy Evaluation and Learning
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Off-Policy Evaluation and Learning for Survival Outcomes under Censoring
by: Kubota, Kohsuke, et al.
Published: (2026)
by: Kubota, Kohsuke, et al.
Published: (2026)
Policy Design for Two-sided Platforms with Participation Dynamics
by: Kiyohara, Haruka, et al.
Published: (2025)
by: Kiyohara, Haruka, et al.
Published: (2025)
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
A General Framework for Off-Policy Learning with Partially-Observed Reward
by: Takehi, Rikiya, et al.
Published: (2025)
by: Takehi, Rikiya, et al.
Published: (2025)
Quotient DAGs for Off-Policy Evaluation:Forward-Flow Importance Sampling and Exact Slate Propensities
by: Xie, Ziwen, et al.
Published: (2026)
by: Xie, Ziwen, et al.
Published: (2026)
Fast Slate Policy Optimization: Going Beyond Plackett-Luce
by: Sakhi, Otmane, et al.
Published: (2023)
by: Sakhi, Otmane, et al.
Published: (2023)
Off-Policy Learning with Limited Supply
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback
by: Goyal, Tanmay, et al.
Published: (2025)
by: Goyal, Tanmay, et al.
Published: (2025)
Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits
by: Zhan, Ruohan, et al.
Published: (2021)
by: Zhan, Ruohan, et al.
Published: (2021)
PAC Off-Policy Prediction of Contextual Bandits
by: Wan, Yilong, et al.
Published: (2025)
by: Wan, Yilong, et al.
Published: (2025)
Context-Action Embedding Learning for Off-Policy Evaluation in Contextual Bandits
by: Chandak, Kushagra, et al.
Published: (2025)
by: Chandak, Kushagra, et al.
Published: (2025)
Safely Exploring Novel Actions in Recommender Systems via Deployment-Efficient Policy Learning
by: Kiyohara, Haruka, et al.
Published: (2025)
by: Kiyohara, Haruka, et al.
Published: (2025)
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation
by: Chaudhari, Shreyas, et al.
Published: (2024)
by: Chaudhari, Shreyas, et al.
Published: (2024)
Optimal Baseline Corrections for Off-Policy Contextual Bandits
by: Gupta, Shashank, et al.
Published: (2024)
by: Gupta, Shashank, et al.
Published: (2024)
Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching
by: Kishimoto, Ren, et al.
Published: (2026)
by: Kishimoto, Ren, et al.
Published: (2026)
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
by: Mroueh, Youssef, et al.
Published: (2025)
by: Mroueh, Youssef, et al.
Published: (2025)
Prompt-to-Slate: Diffusion Models for Prompt-Conditioned Slate Generation
by: Tomasi, Federico, et al.
Published: (2024)
by: Tomasi, Federico, et al.
Published: (2024)
Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy
by: Lee, Kyungbok, et al.
Published: (2024)
by: Lee, Kyungbok, et al.
Published: (2024)
Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling
by: Takahashi, Tatsuki, et al.
Published: (2025)
by: Takahashi, Tatsuki, et al.
Published: (2025)
Cross-Validated Off-Policy Evaluation
by: Cief, Matej, et al.
Published: (2024)
by: Cief, Matej, et al.
Published: (2024)
Optimal Regret for Policy Optimization in Contextual Bandits
by: Levy, Orin, et al.
Published: (2026)
by: Levy, Orin, et al.
Published: (2026)
MultiScale Contextual Bandits for Long Term Objectives
by: Rastogi, Richa, et al.
Published: (2025)
by: Rastogi, Richa, et al.
Published: (2025)
When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective
by: Sun, Hao, et al.
Published: (2023)
by: Sun, Hao, et al.
Published: (2023)
Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation
by: Yokozawa, Riko, et al.
Published: (2025)
by: Yokozawa, Riko, et al.
Published: (2025)
RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization
by: Xia, Linxuan, et al.
Published: (2026)
by: Xia, Linxuan, et al.
Published: (2026)
Transductive Off-policy Proximal Policy Optimization
by: Gan, Yaozhong, et al.
Published: (2024)
by: Gan, Yaozhong, et al.
Published: (2024)
Concept-driven Off Policy Evaluation
by: Majumdar, Ritam, et al.
Published: (2024)
by: Majumdar, Ritam, et al.
Published: (2024)
Clustering Context in Off-Policy Evaluation
by: Guzman-Olivares, Daniel, et al.
Published: (2025)
by: Guzman-Olivares, Daniel, et al.
Published: (2025)
Combinatorial Allocation Bandits with Nonlinear Arm Utility
by: Shibukawa, Yuki, et al.
Published: (2026)
by: Shibukawa, Yuki, et al.
Published: (2026)
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)
by: Cohen, Taco, et al.
Published: (2025)
Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
by: Lobo, Elita, et al.
Published: (2024)
by: Lobo, Elita, et al.
Published: (2024)
Similar Items
-
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by: Shimizu, Tatsuhiro, et al.
Published: (2024) -
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
by: Saito, Yuta, et al.
Published: (2024) -
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023) -
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023) -
Prompt Optimization with Logged Bandit Data
by: Kiyohara, Haruka, et al.
Published: (2025)