-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Include initial state manager for BipedalWalker #1305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Include initial state manager for BipedalWalker #1305
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting solution
Could you add tests and some general documentation on how to use this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the changes, I don't understand why the class is required or half the changes.
Could you explain what the changes do and why they are implemented like this?
.gitignore
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these changes
Yes, I'll include an overview, tests and a tutorial here soon. In short, the idea is to provide the possibility of designing the obstacle course, by choosing which obstacles at which point should be included in the terrain and also their characteristics. Also, if the design choice for the solution is a problem, I can refactor it to suit a specific standard. |
To me, the class seems unnecessary, or at least in its overly complex. We wish to have an easy-to-understand code base |
I've removed the class and created two internal functions:
Looks simpler to me, let me know what you think. Where can I include examples on how to use this? Here's an example on how this could be used to design a specific terrain (or copy an existing terrain): import gymnasium as gym
GRASS, STUMP, STAIRS, PIT, _STATES_ = range(5)
OBSTACLES = dict(
down_stairs=dict(state=2, metadata=(-1, 4, 2)),
up_stairs=dict(state=2, metadata=(1, 4, 2)),
small_stump=dict(state=1, metadata=1),
large_stump=dict(state=1, metadata=3),
hole=dict(state=3, metadata=3),
)
env = gym.make("BipedalWalker-v3", hardcore=True, render_mode="human")
metadata = dict(
states=[OBSTACLES["hole"], OBSTACLES["hole"], OBSTACLES["large_stump"]],
x_variations=False,
y_variations=True
)
# Use the `options` parameter to pass the terrain metadata and initialize the environment
options = dict(metadata=metadata)
env.reset(options=options)
# Execute for visualization
for _ in range(100):
# this is where you would insert your policy
action = env.action_space.sample()
# step (transition) through the environment with the action
# receiving the next observation, reward and if the episode has terminated or truncated
observation, reward, terminated, truncated, info = env.step(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arthur-plautz Your changes make way more sense, thanks for the changes.
Could you add documentation in the class docstring with a section on determining terrian with a note in the version section.
Then could you add some tests in tests/envs/test_env_implementations.py
to check that the implementation works as expected.
@@ -281,7 +288,80 @@ def _destroy(self): | |||
self.legs = [] | |||
self.joints = [] | |||
|
|||
def _process_terrain_metadata(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a docstring to explain what the function does
self.terrain_grass = TERRAIN_GRASS | ||
self._terrain_metadata = dict(states=[]) | ||
|
||
def _generate_terrain_state(self, state: int) -> any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring
|
||
if self._predefined_terrain: | ||
if state == GRASS: | ||
next_state = self._terrain_metadata["states"][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why for GRASS do we only get the first while for the next state we pop the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the grass state doesn't have any significant metadata to store, it can be used as a transition state, as defined in the code you highlighted.
Lines 474 - 479 define that a new state is always randomly generated from a grass state, which makes sense, since the grass is the intermediary state between obstacles. This way, when the generator enters in a grass state, it has to either retrieve the next state stored or store it, with the goal of signalizing what obstacle should be generated next. Then, when entering that obstacle state, it removes it from the state list and processes its generation in the terrain using the provided metadata.
Description
The changes described in this PR contribute with the possibility of controlling the terrain generation for the
BipedalWalkerV3
.Being able to retrieve and configure a specific terrain for this environment can be very useful for multiple use-cases, by allowing the user to have a greater visibility and control over the terrain.
The
TerrainMetadata
class was added to manage the read/write operations over random values generated to build the terrain, when calling theenv.reset
method. The code snippet below showcases the use of the this new functionality:Further details on the motivation of this change are described in the referenced issue below.
Fixes # 892
Type of change
Please delete options that are not relevant.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)