Include initial state manager for BipedalWalker #1305

arthur-plautz · 2025-02-10T03:47:42Z

Description

The changes described in this PR contribute with the possibility of controlling the terrain generation for the BipedalWalkerV3.
Being able to retrieve and configure a specific terrain for this environment can be very useful for multiple use-cases, by allowing the user to have a greater visibility and control over the terrain.

The TerrainMetadata class was added to manage the read/write operations over random values generated to build the terrain, when calling the env.reset method. The code snippet below showcases the use of the this new functionality:

import gymnasium as gym

env = gym.make('BipedalWalker-v3')

# Use the default reset method / This will populate the `_terrain_metadata` property
env.reset()

# Retrieve the property directly through the `terrain_metadata` method
bipedal_env = env.unwrapped
metadata = bipedal_env._terrain_metadata

# Use the `options` parameter to pass the terrain metadata and initialize the environment
options = dict(
    metadata=metadata
)
env.reset(options)

Further details on the motivation of this change are described in the referenced issue below.

Fixes # 892

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

pseudo-rnd-thoughts

Interesting solution

Could you add tests and some general documentation on how to use this

pseudo-rnd-thoughts

Looking through the changes, I don't understand why the class is required or half the changes.
Could you explain what the changes do and why they are implemented like this?

pseudo-rnd-thoughts · 2025-03-25T15:03:34Z

.gitignore

Remove these changes

arthur-plautz · 2025-04-01T14:27:46Z

Looking through the changes, I don't understand why the class is required or half the changes. Could you explain what the changes do and why they are implemented like this?

Yes, I'll include an overview, tests and a tutorial here soon. In short, the idea is to provide the possibility of designing the obstacle course, by choosing which obstacles at which point should be included in the terrain and also their characteristics.

Also, if the design choice for the solution is a problem, I can refactor it to suit a specific standard.

pseudo-rnd-thoughts · 2025-04-10T00:04:28Z

Also, if the design choice for the solution is a problem, I can refactor it to suit a specific standard.

To me, the class seems unnecessary, or at least in its overly complex. We wish to have an easy-to-understand code base

arthur-plautz · 2025-04-11T13:06:48Z

Also, if the design choice for the solution is a problem, I can refactor it to suit a specific standard.

To me, the class seems unnecessary, or at least in its overly complex. We wish to have an easy-to-understand code base

@pseudo-rnd-thoughts

I've removed the class and created two internal functions:

_process_terrain_metadata: Initializes the terrain metadata properties
_generate_terrain_state: From a given state, generates/copies the corresponding metadata (height, stairs length, size, etc.)

Looks simpler to me, let me know what you think.
I've also added an option to remove the fall down penalty, since it may have a negative impact when using evolutionary strategies, as stated in this paper (https://arxiv.org/abs/2205.07592).

Where can I include examples on how to use this?

Here's an example on how this could be used to design a specific terrain (or copy an existing terrain):

import gymnasium as gym

GRASS, STUMP, STAIRS, PIT, _STATES_ = range(5)
OBSTACLES = dict(
    down_stairs=dict(state=2, metadata=(-1, 4, 2)),
    up_stairs=dict(state=2, metadata=(1, 4, 2)),
    small_stump=dict(state=1, metadata=1),
    large_stump=dict(state=1, metadata=3),
    hole=dict(state=3, metadata=3),
)

env = gym.make("BipedalWalker-v3", hardcore=True, render_mode="human")

metadata = dict(
    states=[OBSTACLES["hole"], OBSTACLES["hole"], OBSTACLES["large_stump"]],
    x_variations=False,
    y_variations=True
)

# Use the `options` parameter to pass the terrain metadata and initialize the environment
options = dict(metadata=metadata)
env.reset(options=options)

# Execute for visualization
for _ in range(100):
    # this is where you would insert your policy
    action = env.action_space.sample()

    # step (transition) through the environment with the action
    # receiving the next observation, reward and if the episode has terminated or truncated
    observation, reward, terminated, truncated, info = env.step(action)

pseudo-rnd-thoughts

@arthur-plautz Your changes make way more sense, thanks for the changes.

Could you add documentation in the class docstring with a section on determining terrian with a note in the version section.
Then could you add some tests in tests/envs/test_env_implementations.py to check that the implementation works as expected.

pseudo-rnd-thoughts · 2025-04-13T12:33:08Z

gymnasium/envs/box2d/bipedal_walker.py

@@ -281,7 +288,80 @@ def _destroy(self):
        self.legs = []
        self.joints = []

+    def _process_terrain_metadata(self):


Could you add a docstring to explain what the function does

pseudo-rnd-thoughts · 2025-04-13T12:43:11Z

gymnasium/envs/box2d/bipedal_walker.py

+            self.terrain_grass = TERRAIN_GRASS
+            self._terrain_metadata = dict(states=[])
+
+    def _generate_terrain_state(self, state: int) -> any:


Add docstring

pseudo-rnd-thoughts · 2025-04-13T20:28:53Z

gymnasium/envs/box2d/bipedal_walker.py

+
+        if self._predefined_terrain:
+            if state == GRASS:
+                next_state = self._terrain_metadata["states"][0]


Why for GRASS do we only get the first while for the next state we pop the first?

Since the grass state doesn't have any significant metadata to store, it can be used as a transition state, as defined in the code you highlighted.
Lines 474 - 479 define that a new state is always randomly generated from a grass state, which makes sense, since the grass is the intermediary state between obstacles. This way, when the generator enters in a grass state, it has to either retrieve the next state stored or store it, with the goal of signalizing what obstacle should be generated next. Then, when entering that obstacle state, it removes it from the state list and processes its generation in the terrain using the provided metadata.

arthur-ventura-astro added 2 commits February 10, 2025 00:12

Add BipedalWalker initial state manager

cce6986

Add BipedalWalker interface for retrieving terrain metadata

d4e43a6

arthur-plautz mentioned this pull request Feb 10, 2025

[Proposal] Initial State Tracking - Bipedal Walker Hardcore #892

Open

1 task

pseudo-rnd-thoughts requested changes Feb 10, 2025

View reviewed changes

arthur-ventura-astro added 4 commits March 11, 2025 12:14

Add designed environment mapping

93884de

Add terrain metadata documentation

821eb0d

Add grass variation logic for grass length

b6f6c3f

Splitted grass variations on x/y

34bbd2e

pseudo-rnd-thoughts reviewed Mar 25, 2025

View reviewed changes

.gitignore Outdated

Copy link

Member

pseudo-rnd-thoughts Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these changes

arthur-plautz reacted with thumbs up emoji

Removing TerrainMetadata class - Refactored terrain metadata generation

d95581d

pseudo-rnd-thoughts requested changes Apr 13, 2025

View reviewed changes

arthur-ventura-astro and others added 3 commits April 15, 2025 21:59

Reset state of terrain metadata for every reset method call

b48a77f

Add docstrings to terrain metadata methods

a2d2448

Removing local changes from .gitignore

c2dc55d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Include initial state manager for BipedalWalker #1305

Include initial state manager for BipedalWalker #1305

Uh oh!

arthur-plautz commented Feb 10, 2025 •

edited

Loading

Uh oh!

pseudo-rnd-thoughts left a comment

Uh oh!

pseudo-rnd-thoughts left a comment

Uh oh!

pseudo-rnd-thoughts Mar 25, 2025

Uh oh!

arthur-plautz commented Apr 1, 2025 •

edited

Loading

Uh oh!

pseudo-rnd-thoughts commented Apr 10, 2025

Uh oh!

arthur-plautz commented Apr 11, 2025 •

edited

Loading

Uh oh!

pseudo-rnd-thoughts left a comment

Uh oh!

pseudo-rnd-thoughts Apr 13, 2025

Uh oh!

pseudo-rnd-thoughts Apr 13, 2025

Uh oh!

pseudo-rnd-thoughts Apr 13, 2025

Uh oh!

arthur-plautz Jun 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Include initial state manager for BipedalWalker #1305

Are you sure you want to change the base?

Include initial state manager for BipedalWalker #1305

Uh oh!

Conversation

arthur-plautz commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

arthur-plautz commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pseudo-rnd-thoughts commented Apr 10, 2025

Uh oh!

arthur-plautz commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts Apr 13, 2025

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts Apr 13, 2025

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts Apr 13, 2025

Choose a reason for hiding this comment

Uh oh!

arthur-plautz Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arthur-plautz commented Feb 10, 2025 •

edited

Loading

arthur-plautz commented Apr 1, 2025 •

edited

Loading

arthur-plautz commented Apr 11, 2025 •

edited

Loading

arthur-plautz Jun 12, 2025 •

edited

Loading