Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Guo, Hongbo, Zhu, Zheqing
Format:	Preprint
Publié:	2024
Sujets:	Machine Learning
Accès en ligne:	https://arxiv.org/abs/2406.02515
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866910472088846336
author	Guo, Hongbo Zhu, Zheqing
author_facet	Guo, Hongbo Zhu, Zheqing
contents	Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance contextual bandit learning and has triggered significant interest from both academia and industry. However, a major challenge arises when implementing a disjoint neural contextual bandit solution in large-scale recommendation systems, where each item or user may correspond to a separate bandit arm. The huge number of items to recommend poses a significant hurdle for real world production deployment. This paper focuses on a joint neural contextual bandit solution which serves all recommending items in one single model. The output consists of a predicted reward $μ$, an uncertainty $σ$ and a hyper-parameter $α$ which balances exploitation and exploration, e.g., $μ+ ασ$. The tuning of the parameter $α$ is typically heuristic and complex in practice due to its stochastic nature. To address this challenge, we provide both theoretical analysis and experimental findings regarding the uncertainty $σ$ of the joint neural contextual bandit model. Our analysis reveals that $α$ demonstrates an approximate square root relationship with the size of the last hidden layer $F$ and inverse square root relationship with the amount of training data $N$, i.e., $σ\propto \sqrt{\frac{F}{N}}$. The experiments, conducted with real industrial data, align with the theoretical analysis, help understanding model behaviors and assist the hyper-parameter tuning during both offline training and online deployment.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_02515
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Uncertainty of Joint Neural Contextual Bandit Guo, Hongbo Zhu, Zheqing Machine Learning Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance contextual bandit learning and has triggered significant interest from both academia and industry. However, a major challenge arises when implementing a disjoint neural contextual bandit solution in large-scale recommendation systems, where each item or user may correspond to a separate bandit arm. The huge number of items to recommend poses a significant hurdle for real world production deployment. This paper focuses on a joint neural contextual bandit solution which serves all recommending items in one single model. The output consists of a predicted reward $μ$, an uncertainty $σ$ and a hyper-parameter $α$ which balances exploitation and exploration, e.g., $μ+ ασ$. The tuning of the parameter $α$ is typically heuristic and complex in practice due to its stochastic nature. To address this challenge, we provide both theoretical analysis and experimental findings regarding the uncertainty $σ$ of the joint neural contextual bandit model. Our analysis reveals that $α$ demonstrates an approximate square root relationship with the size of the last hidden layer $F$ and inverse square root relationship with the amount of training data $N$, i.e., $σ\propto \sqrt{\frac{F}{N}}$. The experiments, conducted with real industrial data, align with the theoretical analysis, help understanding model behaviors and assist the hyper-parameter tuning during both offline training and online deployment.
title	Uncertainty of Joint Neural Contextual Bandit
topic	Machine Learning
url	https://arxiv.org/abs/2406.02515

Documents similaires