Registro fonte: :: Library Catalog

Na minha lista:

Detalhes bibliográficos
Principais autores:	Aminian, Gholamali, Behnamnia, Armin, Vega, Roberto, Toni, Laura, Shi, Chengchun, Rabiee, Hamid R., Rivasplata, Omar, Rodrigues, Miguel R. D.
Formato:	Preprint
Publicado em:	2022
Assuntos:	Machine Learning Artificial Intelligence Information Theory
Acesso em linha:	https://arxiv.org/abs/2209.07148
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

_version_	1866913236592361472
author	Aminian, Gholamali Behnamnia, Armin Vega, Roberto Toni, Laura Shi, Chengchun Rabiee, Hamid R. Rivasplata, Omar Rodrigues, Miguel R. D.
author_facet	Aminian, Gholamali Behnamnia, Armin Vega, Roberto Toni, Laura Shi, Chengchun Rabiee, Hamid R. Rivasplata, Omar Rodrigues, Miguel R. D.
contents	Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework, which also assumes access to propensity scores. We propose learning methods for problems where feedback is missing for some samples, so there are samples with feedback and samples missing-feedback in the logged data. We refer to this type of learning as semi-supervised batch learning from logged data, which arises in a wide range of application domains. We derive a novel upper bound for the true risk under the inverse propensity score estimator to address this kind of learning problem. Using this bound, we propose a regularized semi-supervised batch learning method with logged data where the regularization term is feedback-independent and, as a result, can be evaluated using the logged missing-feedback data. Consequently, even though feedback is only present for some samples, a learning policy can be learned by leveraging the missing-feedback samples. The results of experiments derived from benchmark datasets indicate that these algorithms achieve policies with better performance in comparison with logging policies.
format	Preprint
id	arxiv_https___arxiv_org_abs_2209_07148
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	Semi-supervised Batch Learning From Logged Data Aminian, Gholamali Behnamnia, Armin Vega, Roberto Toni, Laura Shi, Chengchun Rabiee, Hamid R. Rivasplata, Omar Rodrigues, Miguel R. D. Machine Learning Artificial Intelligence Information Theory Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework, which also assumes access to propensity scores. We propose learning methods for problems where feedback is missing for some samples, so there are samples with feedback and samples missing-feedback in the logged data. We refer to this type of learning as semi-supervised batch learning from logged data, which arises in a wide range of application domains. We derive a novel upper bound for the true risk under the inverse propensity score estimator to address this kind of learning problem. Using this bound, we propose a regularized semi-supervised batch learning method with logged data where the regularization term is feedback-independent and, as a result, can be evaluated using the logged missing-feedback data. Consequently, even though feedback is only present for some samples, a learning policy can be learned by leveraging the missing-feedback samples. The results of experiments derived from benchmark datasets indicate that these algorithms achieve policies with better performance in comparison with logging policies.
title	Semi-supervised Batch Learning From Logged Data
topic	Machine Learning Artificial Intelligence Information Theory
url	https://arxiv.org/abs/2209.07148

Registros relacionados