【正文】
particularly in large urban the need arises for simulating and optimizing traffic control algorithms to better acmodate this increasing demand. In this paper we study the simulation and optimization of traffic light controllers in a city and present an adaptive optimization algorithm based on reinforcement learning. We have implemented a traffic light simulator, Green Light District, that allows us to experiment with different infrastructures and to pare different traffic light controllers. Experimental results indicate that our adaptive traffic light controllers outperform other fixed controllers on all studied infrastructures. Keywords: Intelligent Traffic Light Control, Reinforcement Learning, MultiAgent Systems (MAS), Smart Infrastructures, Transportation Research 1 Introduction Transportation research has the goal to optimize transportation flow of people and the number of road users constantly increases, and resources provided by current infrastructures are limited, intelligent control of traffic will bee a very important issue in the future. However, some limitations to the usage of intelligent traffic control exist. Avoiding traffic jams for example is thought to be beneficial to both environment and economy, but improved trafficflow may also lead to an increase in demand [Levinson, 2021]. There are several models for traffic simulation. In our research we focus on microscopic models that model the behavior of individual vehicles, and thereby can simulate dynamics of groups of vehicles. Research has shown that such models yield realistic behavior [Nagel and Schreckenberg, 1992, Wahle and Schreckenberg, 2021]. Cars in urban traffic can experience long travel times due to inefficient traffic light control. Optimal control of traffic lights using sophisticated sensors and intelligent optimization algorithms might therefore be very beneficial. Optimization of traffic light switching increases road capacity and traffic flow, and can prevent traffic congestions. Traffic light control is a plex optimization problem and several intelligent algorithms, such as fuzzy logic, evolutionary algorithms, and reinforcement learning (RL) have already been used in attempts to solve it. In this paper we describe a modelbased, multiagent reinforcement learning algorithm for controlling traffic lights. In our approach, reinforcement learning [Sutton and Barto, 1998, Kaelbling et al., 1996] with roaduserbased value functions [Wiering, 2021] is used to determine optimal decisions for each traffic light. The decision is based on a cumulative vote of all road users standing for a traffic junction, where each car votes using its estimated advantage (or gain) of setting its light to green. The gainvalue is the difference between the total time it expects to wait during the rest of its trip if the light for which it is currently standing is red, and if it is green. The waiting time until cars arrive at their destination is estimated by monitoring cars flowing