Saved in:
| Main Authors: | Shimizu, Tatsuhiro, Tanaka, Koichi, Kishimoto, Ren, Kiyohara, Haruka, Nomura, Masahiro, Saito, Yuta |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.11202 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
by: Kiyohara, Haruka, et al.
Published: (2024)
by: Kiyohara, Haruka, et al.
Published: (2024)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Prompt Optimization with Logged Bandit Data
by: Kiyohara, Haruka, et al.
Published: (2025)
by: Kiyohara, Haruka, et al.
Published: (2025)
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
Off-Policy Learning with Limited Supply
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Offline Contextual Bandits in the Presence of New Actions
by: Kishimoto, Ren, et al.
Published: (2026)
by: Kishimoto, Ren, et al.
Published: (2026)
Off-Policy Evaluation and Learning for Survival Outcomes under Censoring
by: Kubota, Kohsuke, et al.
Published: (2026)
by: Kubota, Kohsuke, et al.
Published: (2026)
Combinatorial Allocation Bandits with Nonlinear Arm Utility
by: Shibukawa, Yuki, et al.
Published: (2026)
by: Shibukawa, Yuki, et al.
Published: (2026)
Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching
by: Kishimoto, Ren, et al.
Published: (2026)
by: Kishimoto, Ren, et al.
Published: (2026)
A Contextual Combinatorial Bandit Approach to Negotiation
by: Li, Yexin, et al.
Published: (2024)
by: Li, Yexin, et al.
Published: (2024)
Contextual Combinatorial Bandits with Probabilistically Triggered Arms
by: Liu, Xutong, et al.
Published: (2023)
by: Liu, Xutong, et al.
Published: (2023)
When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective
by: Sun, Hao, et al.
Published: (2023)
by: Sun, Hao, et al.
Published: (2023)
Safely Exploring Novel Actions in Recommender Systems via Deployment-Efficient Policy Learning
by: Kiyohara, Haruka, et al.
Published: (2025)
by: Kiyohara, Haruka, et al.
Published: (2025)
Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation Method
by: Huang, Yong, et al.
Published: (2024)
by: Huang, Yong, et al.
Published: (2024)
From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards
by: Erez, Liad, et al.
Published: (2025)
by: Erez, Liad, et al.
Published: (2025)
Learning When to Trust in Contextual Bandits
by: Ghasemi, Majid, et al.
Published: (2026)
by: Ghasemi, Majid, et al.
Published: (2026)
Learning Action Embeddings for Off-Policy Evaluation
by: Cief, Matej, et al.
Published: (2023)
by: Cief, Matej, et al.
Published: (2023)
Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation
by: Yokozawa, Riko, et al.
Published: (2025)
by: Yokozawa, Riko, et al.
Published: (2025)
Tree Ensembles for Contextual Bandits
by: Nilsson, Hannes, et al.
Published: (2024)
by: Nilsson, Hannes, et al.
Published: (2024)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
Online Prompt Pricing based on Combinatorial Multi-Armed Bandit and Hierarchical Stackelberg Game
by: Li, Meiling, et al.
Published: (2024)
by: Li, Meiling, et al.
Published: (2024)
Neural Combinatorial Clustered Bandits for Recommendation Systems
by: Atalar, Baran, et al.
Published: (2024)
by: Atalar, Baran, et al.
Published: (2024)
Bayesian Analysis of Combinatorial Gaussian Process Bandits
by: Sandberg, Jack, et al.
Published: (2023)
by: Sandberg, Jack, et al.
Published: (2023)
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
by: Lee, Haanvid, et al.
Published: (2024)
by: Lee, Haanvid, et al.
Published: (2024)
Causal Contextual Bandits with Adaptive Context
by: Madhavan, Rahul, et al.
Published: (2024)
by: Madhavan, Rahul, et al.
Published: (2024)
Diffusion Models Meet Contextual Bandits
by: Aouali, Imad
Published: (2024)
by: Aouali, Imad
Published: (2024)
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
Concept-driven Off Policy Evaluation
by: Majumdar, Ritam, et al.
Published: (2024)
by: Majumdar, Ritam, et al.
Published: (2024)
Clustering Context in Off-Policy Evaluation
by: Guzman-Olivares, Daniel, et al.
Published: (2025)
by: Guzman-Olivares, Daniel, et al.
Published: (2025)
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
by: Aouali, Imad, et al.
Published: (2024)
by: Aouali, Imad, et al.
Published: (2024)
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits
by: Zhao, Qingyue, et al.
Published: (2025)
by: Zhao, Qingyue, et al.
Published: (2025)
Federated Linear Contextual Bandits with Heterogeneous Clients
by: Blaser, Ethan, et al.
Published: (2024)
by: Blaser, Ethan, et al.
Published: (2024)
Conservative Contextual Bandits: Beyond Linear Representations
by: Deb, Rohan, et al.
Published: (2024)
by: Deb, Rohan, et al.
Published: (2024)
Linear Contextual Bandits with Hybrid Payoff: Revisited
by: Das, Nirjhar, et al.
Published: (2024)
by: Das, Nirjhar, et al.
Published: (2024)
The Sample Complexity of Multiclass and Sparse Contextual Bandits
by: Erez, Liad, et al.
Published: (2026)
by: Erez, Liad, et al.
Published: (2026)
Adaptive Budget Optimization for Multichannel Advertising Using Combinatorial Bandits
by: Gangopadhyay, Briti, et al.
Published: (2025)
by: Gangopadhyay, Briti, et al.
Published: (2025)
Off-Policy Evaluation and Learning for Matching Markets
by: Hayashi, Yudai, et al.
Published: (2025)
by: Hayashi, Yudai, et al.
Published: (2025)
Zero-Shot Off-Policy Learning
by: Asadulaev, Arip, et al.
Published: (2026)
by: Asadulaev, Arip, et al.
Published: (2026)
Similar Items
-
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
by: Kiyohara, Haruka, et al.
Published: (2024) -
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023) -
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023) -
Prompt Optimization with Logged Bandit Data
by: Kiyohara, Haruka, et al.
Published: (2025) -
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
by: Shimizu, Tatsuhiro, et al.
Published: (2025)