Saved in:
Bibliographic Details
Main Authors: Sequeira, André, Santos, Luis Paulo, Barbosa, Luis Soares
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.09614
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929385733357568
author Sequeira, André
Santos, Luis Paulo
Barbosa, Luis Soares
author_facet Sequeira, André
Santos, Luis Paulo
Barbosa, Luis Soares
contents This research explores the trainability of Parameterized Quantum circuit-based policies in Reinforcement Learning, an area that has recently seen a surge in empirical exploration. While some studies suggest improved sample complexity using quantum gradient estimation, the efficient trainability of these policies remains an open question. Our findings reveal significant challenges, including standard Barren Plateaus with exponentially small gradients and gradient explosion. These phenomena depend on the type of basis-state partitioning and mapping these partitions onto actions. For a polynomial number of actions, a trainable window can be ensured with a polynomial number of measurements if a contiguous-like partitioning of basis-states is employed. These results are empirically validated in a multi-armed bandit environment.
format Preprint
id arxiv_https___arxiv_org_abs_2406_09614
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Trainability issues in quantum policy gradients
Sequeira, André
Santos, Luis Paulo
Barbosa, Luis Soares
Quantum Physics
Machine Learning
This research explores the trainability of Parameterized Quantum circuit-based policies in Reinforcement Learning, an area that has recently seen a surge in empirical exploration. While some studies suggest improved sample complexity using quantum gradient estimation, the efficient trainability of these policies remains an open question. Our findings reveal significant challenges, including standard Barren Plateaus with exponentially small gradients and gradient explosion. These phenomena depend on the type of basis-state partitioning and mapping these partitions onto actions. For a polynomial number of actions, a trainable window can be ensured with a polynomial number of measurements if a contiguous-like partitioning of basis-states is employed. These results are empirically validated in a multi-armed bandit environment.
title Trainability issues in quantum policy gradients
topic Quantum Physics
Machine Learning
url https://arxiv.org/abs/2406.09614