Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shen, Chenglei, Zhan, Yi, Yu, Weijie, Zhang, Xiao, Xu, Jun
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.08067
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908820566966272
author	Shen, Chenglei Zhan, Yi Yu, Weijie Zhang, Xiao Xu, Jun
author_facet	Shen, Chenglei Zhan, Yi Yu, Weijie Zhang, Xiao Xu, Jun
contents	In real-world streaming recommender systems, user preferences evolve dynamically over time. Existing bandit-based methods treat time merely as a timestamp, neglecting its explicit relationship with user preferences and leading to suboptimal performance. Moreover, online learning methods often suffer from inefficient exploration-exploitation during the early online phase. To address these issues, we propose HyperBandit+, a novel contextual bandit policy that integrates a time-aware hypernetwork to adapt to time-varying user preferences and employs a large language model-assisted warm-start mechanism (LLM Start) to enhance exploration-exploitation efficiency in the early online phase. Specifically, HyperBandit+ leverages a neural network that takes time features as input and generates parameters for estimating time-varying rewards by capturing the correlation between time and user preferences. Additionally, the LLM Start mechanism employs multi-step data augmentation to simulate realistic interaction data for effective offline learning, providing warm-start parameters for the bandit policy in the early online phase. To meet real-time streaming recommendation demands, we adopt low-rank factorization to reduce hypernetwork training complexity. Theoretically, we rigorously establish a sublinear regret upper bound that accounts for both the hypernetwork and the LLM warm-start mechanism. Extensive experiments on real-world datasets demonstrate that HyperBandit+ consistently outperforms state-of-the-art baselines in terms of accumulated rewards.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_08067
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Enhancing Bandit Algorithms with LLMs for Time-varying User Preferences in Streaming Recommendations Shen, Chenglei Zhan, Yi Yu, Weijie Zhang, Xiao Xu, Jun Machine Learning In real-world streaming recommender systems, user preferences evolve dynamically over time. Existing bandit-based methods treat time merely as a timestamp, neglecting its explicit relationship with user preferences and leading to suboptimal performance. Moreover, online learning methods often suffer from inefficient exploration-exploitation during the early online phase. To address these issues, we propose HyperBandit+, a novel contextual bandit policy that integrates a time-aware hypernetwork to adapt to time-varying user preferences and employs a large language model-assisted warm-start mechanism (LLM Start) to enhance exploration-exploitation efficiency in the early online phase. Specifically, HyperBandit+ leverages a neural network that takes time features as input and generates parameters for estimating time-varying rewards by capturing the correlation between time and user preferences. Additionally, the LLM Start mechanism employs multi-step data augmentation to simulate realistic interaction data for effective offline learning, providing warm-start parameters for the bandit policy in the early online phase. To meet real-time streaming recommendation demands, we adopt low-rank factorization to reduce hypernetwork training complexity. Theoretically, we rigorously establish a sublinear regret upper bound that accounts for both the hypernetwork and the LLM warm-start mechanism. Extensive experiments on real-world datasets demonstrate that HyperBandit+ consistently outperforms state-of-the-art baselines in terms of accumulated rewards.
title	Enhancing Bandit Algorithms with LLMs for Time-varying User Preferences in Streaming Recommendations
topic	Machine Learning
url	https://arxiv.org/abs/2602.08067

Similar Items