【正文】
[Wiering, 2021] is used to determine optimal decisions for each traffic light. The decision is based on a cumulative vote of all road users standing for a traffic junction, where each car votes using its estimated advantage (or gain) of setting its light to green. The gainvalue is the difference between the total time it expects to wait during the rest of its trip if the light for which it is currently standing is red, and if it is green. The waiting time until cars arrive at their destination is estimated by monitoring cars flowing through the infrastructure and using reinforcement learning (RL) algorithms. We pare the performance of our modelbased RL method to that of other controllers using the Green Light District simulator (GLD). GLD is a traffic simulator that allows us to design arbitrary infrastructures and traffic patterns, monitor traffic flow statistics such as average waiting times, and test different traffic light controllers. The experimental results show that in crowded traffic, the RL controllers outperform all other tested nonadaptive controllers. We also test the use of the learned average waiting times for choosing routes of cars through the city (colearning), and show that by using colearning road users can avoid bottlenecks. This paper is anized as follows. Section 2 describes how traffic can be modelled, predicted, and controlled. In section 3 reinforcement learning is explained and some of its applications are shown. Section 4 surveys several previous approaches to traffic light control, and introduces our new algorithm. Section 5 describes the simulator we used for our experiments, and in section 6 our experiments and their results are given. We conclude in section 7. 2 Modelling and Controlling Traffic In this section, we focus on the use of information technology i n transportation. A lot of ground can be gained in this area, and Intelligent Transportation Systems (ITS) gained interest of several governments and mercial panies [TenT expert group on ITS, 2021, White Paper, 2021, EPA98, 1998]. ITS research includes incar safety systems, simulating effects of infrastructural changes, route planning, optimization of transport, and smart infrastructures. Its main goals are: improving safety, minimizing travel time, and increasing the capacity of infrastructures. Such improvements are beneficial to health, economy, and the environment, and this shows in the allocated budget for ITS. In this paper we are mainly interested in the optimization of traffic flow, thus effectively minimizing average traveling (or waiting) times for cars. A mon tool for analyzing traffic is the traffic simulator. In this section we will first describe two techniques monly used to model traffic. We will then describe how models can be used to obtain realtime traffic information or predict traffic conditions. Afterwards we describe how information can be municated as a means of controlling traffic, and what the effect of this munication on traffic