Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Nameki, Shoma, Nakamura, Atsuyoshi, Komiyama, Junpei, Tabata, Koji
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.22600
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911409104748544
author	Nameki, Shoma Nakamura, Atsuyoshi Komiyama, Junpei Tabata, Koji
author_facet	Nameki, Shoma Nakamura, Atsuyoshi Komiyama, Junpei Tabata, Koji
contents	We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $\mathcal{T}$ and a threshold $θ$, a player must answer whether the root node value of $\mathcal{T}$ is at least $θ$ or not. In the given tree, `MAX' or `MIN' is labeled on each internal node, and the value of a `MAX'-labeled (`MIN'-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $δ$-correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to logarithmic in the number of arms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_22600
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	An Efficient Algorithm for Thresholding Monte Carlo Tree Search Nameki, Shoma Nakamura, Atsuyoshi Komiyama, Junpei Tabata, Koji Machine Learning We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $\mathcal{T}$ and a threshold $θ$, a player must answer whether the root node value of $\mathcal{T}$ is at least $θ$ or not. In the given tree, `MAX' or `MIN' is labeled on each internal node, and the value of a `MAX'-labeled (`MIN'-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $δ$-correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to logarithmic in the number of arms.
title	An Efficient Algorithm for Thresholding Monte Carlo Tree Search
topic	Machine Learning
url	https://arxiv.org/abs/2601.22600

Similar Items