Skip to content

Episodes and resets during training #594

Answered by erikfrey
Balint-H asked this question in Q&A
Discussion options

You must be logged in to vote

Hey Balint,

Answers to your questions:

  1. Yes, vmap/scan means that environments continue to be stepped even after they are done. Brax handles this via an autoreset wrapper that reloads cached state after done == True:

https://github.com/google/brax/blob/main/brax/envs/wrappers/training.py#L151

So that means that in a big batch of num_envs env States, you may see different sim times, for example, if one of the envs in the batch terminated early.

  1. Yes, riding off 1 they contribute useful state.

Think of state.info['first_pipeline_state'] as a cached pool of initial states. The cache size is num_envs - if you think this pool is too small of an initial set of states for RL to explore from, y…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@Balint-H
Comment options

Answer selected by Balint-H
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants