Hardware: Google Colab L4
| Model Type | Discrete | Average Reward | Training Time | Total Training Steps | HuggingFace |
|---|---|---|---|---|---|
| PPO | No | 887.84 | 5:33:03 | 751,614 | Link |
| SAC | No | 787.69 | 6:29:16 | 750,000 | Link |
| DQN | Yes | 897.77 | 5:41:22 | 750,000 | Link |
- Set
ent_coeffor PPO as it encourages exploration of other actions. Stable Baselines3 defaults the value to 0.0. More Information - Do not set your
eval_freqtoo low, as it can sometimes cause instability during learning due to being interrupted by evaluation. (e.g. >=10,000) buffer_sizedefaults to 1,000,000, which requires a significant memory for DQN and SAC. Try setting it to a more practical value when using the original observation space (e.g., 200,000)- Set the
gray_scaleflag in the notebooks toTrueto allow DQN and SAC to run without using the High-RAM option in Google Colab (buffer size <= 150,000). This converts the observation space from (96 x 96 x 3) images to (84 x 84) grayscale images.


