Saved in:
Bibliographic Details
Main Authors: Nameki, Shoma, Nakamura, Atsuyoshi, Komiyama, Junpei, Tabata, Koji
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.22600
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911409104748544
author Nameki, Shoma
Nakamura, Atsuyoshi
Komiyama, Junpei
Tabata, Koji
author_facet Nameki, Shoma
Nakamura, Atsuyoshi
Komiyama, Junpei
Tabata, Koji
contents We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $\mathcal{T}$ and a threshold $θ$, a player must answer whether the root node value of $\mathcal{T}$ is at least $θ$ or not. In the given tree, `MAX' or `MIN' is labeled on each internal node, and the value of a `MAX'-labeled (`MIN'-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $δ$-correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to logarithmic in the number of arms.
format Preprint
id arxiv_https___arxiv_org_abs_2601_22600
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle An Efficient Algorithm for Thresholding Monte Carlo Tree Search
Nameki, Shoma
Nakamura, Atsuyoshi
Komiyama, Junpei
Tabata, Koji
Machine Learning
We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $\mathcal{T}$ and a threshold $θ$, a player must answer whether the root node value of $\mathcal{T}$ is at least $θ$ or not. In the given tree, `MAX' or `MIN' is labeled on each internal node, and the value of a `MAX'-labeled (`MIN'-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $δ$-correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to logarithmic in the number of arms.
title An Efficient Algorithm for Thresholding Monte Carlo Tree Search
topic Machine Learning
url https://arxiv.org/abs/2601.22600