Saved in:
| Main Authors: | Saito, Yuta, Yao, Jihan, Joachims, Thorsten |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.06151 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MultiScale Contextual Bandits for Long Term Objectives
by: Rastogi, Richa, et al.
Published: (2025)
by: Rastogi, Richa, et al.
Published: (2025)
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
by: Kiyohara, Haruka, et al.
Published: (2024)
by: Kiyohara, Haruka, et al.
Published: (2024)
Off-Policy Evaluation and Learning for Matching Markets
by: Hayashi, Yudai, et al.
Published: (2025)
by: Hayashi, Yudai, et al.
Published: (2025)
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Prompt Optimization with Logged Bandit Data
by: Kiyohara, Haruka, et al.
Published: (2025)
by: Kiyohara, Haruka, et al.
Published: (2025)
Long-term Off-Policy Evaluation and Learning
by: Saito, Yuta, et al.
Published: (2024)
by: Saito, Yuta, et al.
Published: (2024)
Off-Policy Evaluation and Learning for Survival Outcomes under Censoring
by: Kubota, Kohsuke, et al.
Published: (2026)
by: Kubota, Kohsuke, et al.
Published: (2026)
Off-Policy Learning with Limited Supply
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
by: Aouali, Imad, et al.
Published: (2024)
by: Aouali, Imad, et al.
Published: (2024)
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
by: Tanaka, Koichi, et al.
Published: (2026)
by: Tanaka, Koichi, et al.
Published: (2026)
Efficient Off-Policy Learning for High-Dimensional Action Spaces
by: Otto, Fabian, et al.
Published: (2024)
by: Otto, Fabian, et al.
Published: (2024)
Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by: Shimizu, Tatsuhiro, et al.
Published: (2024)
by: Shimizu, Tatsuhiro, et al.
Published: (2024)
Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation
by: Aouali, Imad, et al.
Published: (2025)
by: Aouali, Imad, et al.
Published: (2025)
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
by: Shimizu, Tatsuhiro, et al.
Published: (2025)
A General Framework for Off-Policy Learning with Partially-Observed Reward
by: Takehi, Rikiya, et al.
Published: (2025)
by: Takehi, Rikiya, et al.
Published: (2025)
Fairness in Ranking under Disparate Uncertainty
by: Rastogi, Richa, et al.
Published: (2023)
by: Rastogi, Richa, et al.
Published: (2023)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Learning Action Embeddings for Off-Policy Evaluation
by: Cief, Matej, et al.
Published: (2023)
by: Cief, Matej, et al.
Published: (2023)
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
by: Meng, Wenjia, et al.
Published: (2024)
by: Meng, Wenjia, et al.
Published: (2024)
Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling
by: Takahashi, Tatsuki, et al.
Published: (2025)
by: Takahashi, Tatsuki, et al.
Published: (2025)
Context-Action Embedding Learning for Off-Policy Evaluation in Contextual Bandits
by: Chandak, Kushagra, et al.
Published: (2025)
by: Chandak, Kushagra, et al.
Published: (2025)
Reinforcing Language Agents via Policy Optimization with Action Decomposition
by: Wen, Muning, et al.
Published: (2024)
by: Wen, Muning, et al.
Published: (2024)
Language-Based User Profiles for Recommendation
by: Zhou, Joyce, et al.
Published: (2024)
by: Zhou, Joyce, et al.
Published: (2024)
End-to-end Training for Recommendation with Language-based User Profiles
by: Gao, Zhaolin, et al.
Published: (2024)
by: Gao, Zhaolin, et al.
Published: (2024)
Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards
by: Scherer, Christian, et al.
Published: (2026)
by: Scherer, Christian, et al.
Published: (2026)
RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization
by: Xia, Linxuan, et al.
Published: (2026)
by: Xia, Linxuan, et al.
Published: (2026)
When Do Off-Policy and On-Policy Policy Gradient Methods Align?
by: Mambelli, Davide, et al.
Published: (2024)
by: Mambelli, Davide, et al.
Published: (2024)
Zero-Shot Off-Policy Learning
by: Asadulaev, Arip, et al.
Published: (2026)
by: Asadulaev, Arip, et al.
Published: (2026)
Sequential Off-Policy Learning with Logarithmic Smoothing
by: Haddouche, Maxime, et al.
Published: (2025)
by: Haddouche, Maxime, et al.
Published: (2025)
On the Reuse Bias in Off-Policy Reinforcement Learning
by: Ying, Chengyang, et al.
Published: (2022)
by: Ying, Chengyang, et al.
Published: (2022)
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
by: Bolland, Adrien, et al.
Published: (2024)
by: Bolland, Adrien, et al.
Published: (2024)
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
by: Hong, Joey, et al.
Published: (2025)
by: Hong, Joey, et al.
Published: (2025)
Automated Off-Policy Estimator Selection via Supervised Learning
by: Felicioni, Nicolò, et al.
Published: (2024)
by: Felicioni, Nicolò, et al.
Published: (2024)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Explainable Reinforcement Learning via Temporal Policy Decomposition
by: Ruggeri, Franco, et al.
Published: (2025)
by: Ruggeri, Franco, et al.
Published: (2025)
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
by: Zhao, Shiwan, et al.
Published: (2026)
by: Zhao, Shiwan, et al.
Published: (2026)
Off-Policy Value-Based Reinforcement Learning for Large Language Models
by: Wang, Peng-Yuan, et al.
Published: (2026)
by: Wang, Peng-Yuan, et al.
Published: (2026)
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
by: Mroueh, Youssef, et al.
Published: (2025)
by: Mroueh, Youssef, et al.
Published: (2025)
Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces
by: Pan, Chaofan, et al.
Published: (2025)
by: Pan, Chaofan, et al.
Published: (2025)
Similar Items
-
MultiScale Contextual Bandits for Long Term Objectives
by: Rastogi, Richa, et al.
Published: (2025) -
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
by: Kiyohara, Haruka, et al.
Published: (2024) -
Off-Policy Evaluation and Learning for Matching Markets
by: Hayashi, Yudai, et al.
Published: (2025) -
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
by: Saito, Yuta, et al.
Published: (2024) -
Prompt Optimization with Logged Bandit Data
by: Kiyohara, Haruka, et al.
Published: (2025)