【正文】
traffic light. The decision is based on a cumulative vote of all road users standingfor a traffic junction, where each car votes using its estimated advantage (or gain) of settingits light to green. The gainvalue is the difference between the total time it expects to waitduring the rest of its trip if the light for which it is currently standing is red, and if it is waiting time until cars arrive at their destination is estimated by monitoring cars flowingthrough the infrastructure and using reinforcement learning (RL) algorithms. We pare the performance of our modelbased RL method to that of other controllersusing the Green Light District simulator (GLD). GLD is a traffic simulator that allows usto design arbitrary infrastructures and traffic patterns, monitor traffic flow statistics such asaverage waiting times, and test different traffic light controllers. The experimental resultsshow that in crowded traffic, the RL controllers outperform all other tested nonadaptivecontrollers. We also test the use of the learned average waiting times for choosing routes of cars through the city (colearning), and show that by using colearning road users can avoidbottlenecks.