-
-
Notifications
You must be signed in to change notification settings - Fork 982
Add termination condition based on percentage of visited tiles for Car Racing #1323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@VincenzoPalma To clarify what was the previous behaviour when the agent had cross that the lap completion percentage and got to the end? Could you train an agent using PPO from SB3 and share the training graphes of the old and new versions? |
@pseudo-rnd-thoughts The previous behavior in that scenario was that the environment would not terminate upon completing the lap but would instead continue until reaching the time limit, at which point it would end with I will try and train an agent as you suggest as soon as i can. |
@VincenzoPalma Have you had any time to train agents for the different Car Racing versions? |
I've only had a few days to work on this so far, and since it's my first time using SB3, it's taking a bit longer. I've obtained some training graphs for 25, 50, 75 and 90 percentage of track covered, but something seems off, so I'm conducting more in depth testing. |
Thanks for doing that @VincenzoPalma, keep me updated here or on discord if you are uncertain how to get some working |
Can i share TensorBoard logs files to show you the training graphs? |
You can either share images on GitHub or message me on discord and that I can look at the files |
I'll share some images as soon as I obtain the graphs with 75 as minimum percentage of visited tiles. I'll also share the code to see if it's correct for the task and to receive feedback. If it's good i'll get the train graphs for the other percentages. |
Thanks for the graphes @VincenzoPalma, overall I'm surprised that the episode reward is always negative. This might be feature of the environment but I would have expected that the environment could be positive. |
My initial assumption is that 500k steps might not be sufficient to achieve a positive reward. I initially set 500k steps mainly to compare the agents before and after the change, but I could try increasing it to a value more in line with the numbers used in the benchmarks you shared to see if it yields better results. |
Here are some tuned hyperparmeters for Car Racing v2 that you could test. However, the SB3 logs doesn't have a v2 only v0 (https://huggingface.co/sb3/ppo-CarRacing-v0) |
Ohh that is way better results. Yeah, if you can do that, use the same hyperparameters and 3 different percentages with both the old and new environments to show the differences. |
I've been following your progress and wanted to pass along something that I noticed when I was training my agent. I think that v3 of the environment provides a better training environment than our current test version. When I was training my agent, I considered modifying the environment such that every time the agent crossed the start/finish line the environment checked to see if the agent had covered enough tiles to consider the lap complete, but didn't terminate the race unless the agent had covered enough tiles. My thinking was that this would help the agent learn to stay on the road because the agent could continue learning for the entirety of the time limit. If you started training with a generous time limit (say 5000 steps) an agent could still "complete the race" even if it took 2 or 3 laps to do so. This would require modifying the truncation and termination conditions we defined in #1269 but I think it'd also speed up training and improve agent performance. My concern is that these changes may be significant enough to create a fundamental difference between the way the current versions of the environment operate compared to future versions were this change to be implemented. |
Description
The environment will now end with
terminated = True
when the lap is completed after reaching the specified percentage of visited tiles.Fixes #1269
Type of change
Checklist: