Saved in:
Bibliographic Details
Main Authors: Miguel Cardoso Moreira, Tiago, Claro, João, Neves-Moreira, Fábio
Format: Recurso digital
Language:English
Published: Zenodo 2024
Online Access:https://doi.org/10.5281/zenodo.14203358
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • <p>This dissertation is about tackling the in-store picking challenges retailers face that come up with the new grocery shopping paradigm of integrating the online channel with the physical channel following an omnichannel approach that can better respond to the different kinds of demand customers have. It is motivated by the growing efforts that serving customers in a competitive market requires from the customer’s point of view - where keeping the service level as high as possible is important - and from the retailer’s point of view - where being efficient in its operations is important to keep their costs low while providing better services. As such, the goal of the dissertation was to train an agent to learn a picking policy using Proximal Policy Optimization, a deep Reinforcement Learning algorithm, capable of travelling the minimum possible distance and avoiding encounters with physical customers in an environment modelled as a Markov Decision Process. To do so, the first step was to devise a set of features that would drive the learning process. Followed by that, came the creation of synthetic stores with different sizes. A hyperparameter tuning technique was employed to tailor the hyperparameters that serve as input to the model to each configuration and then do a more extensive training process with the sets of hyperparameters that achieved better performance. The results were tested and then based on the stability of the training process and the performance a transfer learning technique was reproduced to verify if it was possible to retrain an agent that was trained to do a simpler task in a more complex one. Finally, this strategy was employed in a real world store configuration, in which the store was divided into subsections that were sequentially added to the training process. Some important real world considerations were not implemented which leaves room for future work regarding employing this methodology in driving efficient in-store picking in real world scenarios within other parameters. The obtained results validate this approach for the real world scenario as the learning phase reached a value of 0.95 encounters per product picked and 77.28 reward per product picked.</p>