MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Kurečka, Martin, Nevyhoštěný, Václav, Novotný, Petr, Unčovský, Vít
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2412.13962
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866913617116397568
author	Kurečka, Martin Nevyhoštěný, Václav Novotný, Petr Unčovský, Vít
author_facet	Kurečka, Martin Nevyhoštěný, Václav Novotný, Petr Unčovský, Vít
contents	Constrained Markov decision processes (CMDPs), in which the agent optimizes expected payoffs while keeping the expected cost below a given threshold, are the leading framework for safe sequential decision making under stochastic uncertainty. Among algorithms for planning and learning in CMDPs, methods based on Monte Carlo tree search (MCTS) have particular importance due to their efficiency and extendibility to more complex frameworks (such as partially observable settings and games). However, current MCTS-based methods for CMDPs either struggle with finding safe (i.e., constraint-satisfying) policies, or are too conservative and do not find valuable policies. We introduce Threshold UCT (T-UCT), an online MCTS-based algorithm for CMDP planning. Unlike previous MCTS-based CMDP planners, T-UCT explicitly estimates Pareto curves of cost-utility trade-offs throughout the search tree, using these together with a novel action selection and threshold update rules to seek safe and valuable policies. Our experiments demonstrate that our approach significantly outperforms state-of-the-art methods from the literature.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_13962
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Threshold UCT: Cost-Constrained Monte Carlo Tree Search with Pareto Curves Kurečka, Martin Nevyhoštěný, Václav Novotný, Petr Unčovský, Vít Artificial Intelligence Constrained Markov decision processes (CMDPs), in which the agent optimizes expected payoffs while keeping the expected cost below a given threshold, are the leading framework for safe sequential decision making under stochastic uncertainty. Among algorithms for planning and learning in CMDPs, methods based on Monte Carlo tree search (MCTS) have particular importance due to their efficiency and extendibility to more complex frameworks (such as partially observable settings and games). However, current MCTS-based methods for CMDPs either struggle with finding safe (i.e., constraint-satisfying) policies, or are too conservative and do not find valuable policies. We introduce Threshold UCT (T-UCT), an online MCTS-based algorithm for CMDP planning. Unlike previous MCTS-based CMDP planners, T-UCT explicitly estimates Pareto curves of cost-utility trade-offs throughout the search tree, using these together with a novel action selection and threshold update rules to seek safe and valuable policies. Our experiments demonstrate that our approach significantly outperforms state-of-the-art methods from the literature.
title	Threshold UCT: Cost-Constrained Monte Carlo Tree Search with Pareto Curves
topic	Artificial Intelligence
url	https://arxiv.org/abs/2412.13962

Documenti analoghi