-
Visualize learned policy of the RL agents
- show policy in SUMO simulation for manual inspection & sanity check
- produce graphs where policies can be compared -> gain insights how the policy changes when training on noisy data
-
check any new research on the topic
-
✔ Checked: double-check which learning algorithm is used in the LibSignal code -> The Implementation of PressLight uses Double DQN (DDQN).
Actor is update using Bellman equation everyupdate_model_ratesteps.
Target network is updated everyupdate_target_ratesteps by copying the actor's weights.
Is DDQN still a good enough choice of algorithm in 2024? It's very simple and easy to implement. -
✔ Done: Verify that training and test dataset are different. Repeat experiments if not. Options to get more data:
- Train on synthetic data with different arrival rates, test on real data
- reverse time and/or destinations of first dataset for more diverse data
Results:
By shortening the flow file to the first 140 instead of all ~2700 vehicles, both the training and testing was significantly faster. Therefore we conclude, that the agents are trained and tested on the same data. However, after training an agent on the small dataset and testing it on the full dataset, the agent performed almost identically. This indicates that the dataset is diverse enough to train on without overfitting. The agent also does not get any time information as input, so learning a perfect sequence of actions specific to this dataset is not possible.Testing the extreme: training on a dataset with just one vehicle and testing on the full dataset. Does indeed result in extremely poor performance as the network learns to always give green to the same direction. This is of course not a good policy for the full dataset.
Notes:
Vehicles are added into the simulation inworld/world_sumo_disturbed.pyln. 532based on a sumo config. Upon initialisation, the classWorldgenerates a terminal command used to start SUMO. This contains info like the road network file and the route/ flow file. (Seeln. 370inworld/world_sumo_disturbed.py) Which files are loaded is determined by the sumo config file found inconfigs/sim(e.g.sumo1x3.cfg). This config is loaded to get the relevant file paths. Specifically, the simulation loads the combined file (.sumoconfig), which then specifies the sumo.xmlfiles for network and routes/ flow data.
During training, the environment is reset for each episode intsc_trainer.train()(seeself.env.reset()). This does not indicate any switch of datasets between training and testing. Neither does the similar reset intsc_trainer.train_test(). Training and testing with different (shortened) datasets showed clearly that training and testing are, by default, done on the same data. -
✔ CORRECTED: Verify that the TPR and FPR measured in the simulation are the same as the values that are set.
- Measure TPR & FPR in simulation to verify that a given setting leads to the expected results
- Add correction factor for TPR/ FPR. Currently: measured incorrect detections=(detected vehicles)/TPR⋅FPR=(incorrect detections)/TPR
- The correctness of the simulation can be tested using
src/LibSignal_modified/tpr_test.pyIn original version, the measured FPR was about half of the expected FPR and depended on the TPR.
-
✔ Tested: More repetitions of Experiments to get more reliable results
Testing the model 75 instead of 15 times for a single noise setting did not make much of a difference in the measured metrics. Given the computational cost, we will likely not increase the number of repetitions much. -
(✔) preliminary check done: Understand & explain, why the agent trained on disturbed data performs worse on clean data than on disturbed data. (see Fig. 10 travel time chart)
This doesn't seem to hold true now, after the FPR correction. This behavior may have been caused by a bug in the code that's now fixed. -
Write instructions for how to reproduce the experiments, add
requirements.txtetc. -
Verify results in a second environment (road network)
-
Test more different agents (FRAP, CoLight)
- Agents trained on noisy data worked much better than expected, we don't know why yet.
This could be caused by an undesirable policy that switches between phases constantly or by exploiting certain traffic flow
- Video showing agent policy in SUMO simulation
- Graph showing phase of each intersection over time
- See Fig. 6 in 20201201 Robust RL for TSC:
Queue length over time with green direction marked on the x-axis (per intersection) - See Fig. 9 & 10 in 20190804 Presslight:
Fig. 9: Green wave visualization:
Distance travelled by vehicles in one direction over time with intersection phases in direction of travel marked on horizontal stripes Fig. 10: Space-time with signal phases:
Distance travelled by vehicles in two opposite directions over time with intersection phases in direction of travel marked on horizontal stripes - See Fig. 9 in 20180819 Intellilight:
Phase time ratio over time for each intersection Phase over time
- See Fig. 6 in 20201201 Robust RL for TSC:
| Paper Title | Road Network(s) | RL Algorithm | Simulator |
|---|---|---|---|
| robust RL for TSC | single intersection | double DQN | SUMO |
| PressLight | Qingdao Rd., Jinan 1x3; Beaver Ave., State College 1x5; 8-th to 11-th Ave., NYC (4 separate networks), 1x16 each | DQN with experience replay | cityflow |
| MPLight / A thousand Lights | 4x4 grid; Manhattan | DQN with (shared) experience replay | cityflow |
| FRAP (phase competition signal control) | Atlanta 1x5; Jinan 7 intersections; Hangzhou 1x6 | DQN ? likely with experience replay ? | SUMO |
| CoLight | synth. Arterial 1x3; synth. Grid 3x3; synth. Grid 6x6; Manhattan 196=7x28; Gudang Sub-district, Hangzhou, 4x4; Dongfeng Sub-district, Jinan, 3x4 | unknown -> see code | cityflow |
| RL Benchmarks for TSC | synth. grid 4x4; synth. Ave. grid 4x4; Cologne 1x3; Ingolstadt 1x7 | DQN using Preferred RL library | SUMO |