Saved in:
| Main Authors: | Kishimoto, Ren, Shimizu, Tatsuhiro, Kawamura, Kazuki, Muroi, Takanori, Narita, Yusuke, Sasamoto, Yuki, Tateno, Kei, Udagawa, Takuma, Saito, Yuta |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.18509 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
Safely Exploring Novel Actions in Recommender Systems via Deployment-Efficient Policy Learning
by: Kiyohara, Haruka, et al.
Published: (2025)
by: Kiyohara, Haruka, et al.
Published: (2025)
Counterfactual Reciprocal Recommender Systems for User-to-User Matching
by: Kawamura, Kazuki, et al.
Published: (2025)
by: Kawamura, Kazuki, et al.
Published: (2025)
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by: Shimizu, Tatsuhiro, et al.
Published: (2024)
by: Shimizu, Tatsuhiro, et al.
Published: (2024)
Not Just What, But When: Integrating Irregular Intervals to LLM for Sequential Recommendation
by: Du, Wei-Wei, et al.
Published: (2025)
by: Du, Wei-Wei, et al.
Published: (2025)
PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows
by: Kawamura, Kazuki, et al.
Published: (2026)
by: Kawamura, Kazuki, et al.
Published: (2026)
Off-Policy Learning with Limited Supply
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
MultiScale Contextual Bandits for Long Term Objectives
by: Rastogi, Richa, et al.
Published: (2025)
by: Rastogi, Richa, et al.
Published: (2025)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Combinatorial Allocation Bandits with Nonlinear Arm Utility
by: Shibukawa, Yuki, et al.
Published: (2026)
by: Shibukawa, Yuki, et al.
Published: (2026)
Group-Sensitive Offline Contextual Bandits
by: Guo, Yihong, et al.
Published: (2025)
by: Guo, Yihong, et al.
Published: (2025)
Offline Contextual Bandit with Counterfactual Sample Identification
by: Gilotte, Alexandre, et al.
Published: (2025)
by: Gilotte, Alexandre, et al.
Published: (2025)
Leveraging Offline Data in Linear Latent Contextual Bandits
by: Kausik, Chinmaya, et al.
Published: (2024)
by: Kausik, Chinmaya, et al.
Published: (2024)
Direction-Aware Offline-to-Online Learning in Linear Contextual Bandits
by: Han, Zean, et al.
Published: (2026)
by: Han, Zean, et al.
Published: (2026)
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
NRR-Core: Non-Resolution Reasoning as a Computational Framework for Contextual Identity and Ambiguity Preservation
by: Saito, Kei
Published: (2025)
by: Saito, Kei
Published: (2025)
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
by: Kiyohara, Haruka, et al.
Published: (2024)
by: Kiyohara, Haruka, et al.
Published: (2024)
Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing
by: Ryu, J. Jon, et al.
Published: (2025)
by: Ryu, J. Jon, et al.
Published: (2025)
Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching
by: Kishimoto, Ren, et al.
Published: (2026)
by: Kishimoto, Ren, et al.
Published: (2026)
Prompt Optimization with Logged Bandit Data
by: Kiyohara, Haruka, et al.
Published: (2025)
by: Kiyohara, Haruka, et al.
Published: (2025)
Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation Method
by: Huang, Yong, et al.
Published: (2024)
by: Huang, Yong, et al.
Published: (2024)
Context-Action Embedding Learning for Off-Policy Evaluation in Contextual Bandits
by: Chandak, Kushagra, et al.
Published: (2025)
by: Chandak, Kushagra, et al.
Published: (2025)
Determination of Majorana type-phases from the time evolution of lepton numbers
by: Benoit, Nicholas J., et al.
Published: (2022)
by: Benoit, Nicholas J., et al.
Published: (2022)
Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation
by: Chen, Ziru, et al.
Published: (2026)
by: Chen, Ziru, et al.
Published: (2026)
HTML-LSTM: Information Extraction from HTML Tables in Web Pages using Tree-Structured LSTM
by: Kawamura, Kazuki, et al.
Published: (2024)
by: Kawamura, Kazuki, et al.
Published: (2024)
Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits
by: Qin, Hao, et al.
Published: (2026)
by: Qin, Hao, et al.
Published: (2026)
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits
by: Zhao, Qingyue, et al.
Published: (2025)
by: Zhao, Qingyue, et al.
Published: (2025)
Contextual Combinatorial Bandits with Changing Action Sets via Gaussian Processes
by: Nika, Andi, et al.
Published: (2021)
by: Nika, Andi, et al.
Published: (2021)
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps
by: Omura, Motoki, et al.
Published: (2025)
by: Omura, Motoki, et al.
Published: (2025)
Learning Multiple Object States from Actions via Large Language Models
by: Tateno, Masatoshi, et al.
Published: (2024)
by: Tateno, Masatoshi, et al.
Published: (2024)
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
by: Zhao, Qingyue, et al.
Published: (2026)
by: Zhao, Qingyue, et al.
Published: (2026)
Adaptive Action Duration with Contextual Bandits for Deep Reinforcement Learning in Dynamic Environments
by: Verma, Abhishek, et al.
Published: (2025)
by: Verma, Abhishek, et al.
Published: (2025)
Bayesian Regret Minimization in Offline Bandits
by: Petrik, Marek, et al.
Published: (2023)
by: Petrik, Marek, et al.
Published: (2023)
Sparse Nonparametric Contextual Bandits
by: Flynn, Hamish, et al.
Published: (2025)
by: Flynn, Hamish, et al.
Published: (2025)
On permutation-invariant neural networks
by: Kimura, Masanari, et al.
Published: (2024)
by: Kimura, Masanari, et al.
Published: (2024)
Linear Contextual Bandits with Interference
by: Xu, Yang, et al.
Published: (2024)
by: Xu, Yang, et al.
Published: (2024)
AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models
by: Kawamura, Kazuki, et al.
Published: (2024)
by: Kawamura, Kazuki, et al.
Published: (2024)
Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback
by: Goyal, Tanmay, et al.
Published: (2025)
by: Goyal, Tanmay, et al.
Published: (2025)
Similar Items
-
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
by: Shimizu, Tatsuhiro, et al.
Published: (2025) -
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
by: Tanaka, Koichi, et al.
Published: (2026) -
Safely Exploring Novel Actions in Recommender Systems via Deployment-Efficient Policy Learning
by: Kiyohara, Haruka, et al.
Published: (2025) -
Counterfactual Reciprocal Recommender Systems for User-to-User Matching
by: Kawamura, Kazuki, et al.
Published: (2025) -
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by: Shimizu, Tatsuhiro, et al.
Published: (2024)