Saved in:
Bibliographic Details
Main Authors: Chen, Siguang, Lv, Chunli, Xie, Miao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.12945
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911494876168192
author Chen, Siguang
Lv, Chunli
Xie, Miao
author_facet Chen, Siguang
Lv, Chunli
Xie, Miao
contents Large language models (LLMs) have become powerful and widely used systems for language understanding and generation, while multi-armed bandit (MAB) algorithms provide a principled framework for adaptive decision-making under uncertainty. This survey explores the potential at the intersection of these two fields. As we know, it is the first survey to systematically review the bidirectional interaction between large language models and multi-armed bandits at the component level. We highlight the bidirectional benefits: MAB algorithms address critical LLM challenges, spanning from pre-training to retrieval-augmented generation (RAG) and personalization. Conversely, LLMs enhance MAB systems by redefining core components such as arm definition and environment modeling, thereby improving decision-making in sequential tasks. We analyze existing LLM-enhanced bandit systems and bandit-enhanced LLM systems, providing insights into their design, methodologies, and performance. Key challenges and representative findings are identified to help guide future research. An accompanying GitHub repository that indexes relevant literature is available at https://github.com/bucky1119/Awesome-LLM-Bandit-Interaction.
format Preprint
id arxiv_https___arxiv_org_abs_2601_12945
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits
Chen, Siguang
Lv, Chunli
Xie, Miao
Computation and Language
Machine Learning
Large language models (LLMs) have become powerful and widely used systems for language understanding and generation, while multi-armed bandit (MAB) algorithms provide a principled framework for adaptive decision-making under uncertainty. This survey explores the potential at the intersection of these two fields. As we know, it is the first survey to systematically review the bidirectional interaction between large language models and multi-armed bandits at the component level. We highlight the bidirectional benefits: MAB algorithms address critical LLM challenges, spanning from pre-training to retrieval-augmented generation (RAG) and personalization. Conversely, LLMs enhance MAB systems by redefining core components such as arm definition and environment modeling, thereby improving decision-making in sequential tasks. We analyze existing LLM-enhanced bandit systems and bandit-enhanced LLM systems, providing insights into their design, methodologies, and performance. Key challenges and representative findings are identified to help guide future research. An accompanying GitHub repository that indexes relevant literature is available at https://github.com/bucky1119/Awesome-LLM-Bandit-Interaction.
title A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2601.12945