Saved in:
Bibliographic Details
Main Authors: Kwesiga, Dickness Kakitahi, Guin, Angshuman, Hunter, Michael
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.02189
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913921760231424
author Kwesiga, Dickness Kakitahi
Guin, Angshuman
Hunter, Michael
author_facet Kwesiga, Dickness Kakitahi
Guin, Angshuman
Hunter, Michael
contents Previous studies that have formulated multi-agent reinforcement learning (RL) algorithms for adaptive traffic signal control have primarily used value-based RL methods. However, recent literature has shown that policy-based methods may perform better in partially observable environments. Additionally, RL methods remain largely untested for real-world normally signal timing plans because of the simplifying assumptions common in the literature. The current study attempts to address these gaps and formulates a multi-agent proximal policy optimization (MA-PPO) algorithm to implement adaptive and coordinated traffic control along an arterial corridor. The formulated MA-PPO has a centralized-critic architecture under a centralized training and decentralized execution framework. Agents are designed to allow selection and implementation of up to eight signal phases, as commonly implemented in field controllers. The formulated algorithm is tested on a simulated real-world seven intersection corridor. The speed of convergence for each agent was found to depend on the size of the action space, which depends on the number and sequence of signal phases. The performance of the formulated MA-PPO adaptive control algorithm is compared with the field implemented actuated-coordinated signal control (ASC), modeled using PTV-Vissim-MaxTime software in the loop simulation (SILs). The trained MA-PPO performed significantly better than the ASC for all movements. Compared to ASC the MA-PPO showed 2% and 24% improvements in travel time in the primary and secondary coordination directions, respectively. For cross streets movements MA-PPO also showed significant crossing time reductions. Volume sensitivity experiments revealed that the formulated MA-PPO demonstrated good stability, robustness, and adaptability to changes in traffic demand.
format Preprint
id arxiv_https___arxiv_org_abs_2503_02189
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Adaptive Traffic Signal Control based on Multi-Agent Reinforcement Learning. Case Study on a simulated real-world corridor
Kwesiga, Dickness Kakitahi
Guin, Angshuman
Hunter, Michael
Multiagent Systems
Previous studies that have formulated multi-agent reinforcement learning (RL) algorithms for adaptive traffic signal control have primarily used value-based RL methods. However, recent literature has shown that policy-based methods may perform better in partially observable environments. Additionally, RL methods remain largely untested for real-world normally signal timing plans because of the simplifying assumptions common in the literature. The current study attempts to address these gaps and formulates a multi-agent proximal policy optimization (MA-PPO) algorithm to implement adaptive and coordinated traffic control along an arterial corridor. The formulated MA-PPO has a centralized-critic architecture under a centralized training and decentralized execution framework. Agents are designed to allow selection and implementation of up to eight signal phases, as commonly implemented in field controllers. The formulated algorithm is tested on a simulated real-world seven intersection corridor. The speed of convergence for each agent was found to depend on the size of the action space, which depends on the number and sequence of signal phases. The performance of the formulated MA-PPO adaptive control algorithm is compared with the field implemented actuated-coordinated signal control (ASC), modeled using PTV-Vissim-MaxTime software in the loop simulation (SILs). The trained MA-PPO performed significantly better than the ASC for all movements. Compared to ASC the MA-PPO showed 2% and 24% improvements in travel time in the primary and secondary coordination directions, respectively. For cross streets movements MA-PPO also showed significant crossing time reductions. Volume sensitivity experiments revealed that the formulated MA-PPO demonstrated good stability, robustness, and adaptability to changes in traffic demand.
title Adaptive Traffic Signal Control based on Multi-Agent Reinforcement Learning. Case Study on a simulated real-world corridor
topic Multiagent Systems
url https://arxiv.org/abs/2503.02189