Skip to content

Include initial state manager for BipedalWalker #1305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

arthur-plautz
Copy link

@arthur-plautz arthur-plautz commented Feb 10, 2025

Description

The changes described in this PR contribute with the possibility of controlling the terrain generation for the BipedalWalkerV3.
Being able to retrieve and configure a specific terrain for this environment can be very useful for multiple use-cases, by allowing the user to have a greater visibility and control over the terrain.

The TerrainMetadata class was added to manage the read/write operations over random values generated to build the terrain, when calling the env.reset method. The code snippet below showcases the use of the this new functionality:

import gymnasium as gym

env = gym.make('BipedalWalker-v3')

# Use the default reset method / This will populate the `_terrain_metadata` property
env.reset()

# Retrieve the property directly through the `terrain_metadata` method
bipedal_env = env.unwrapped
metadata = bipedal_env._terrain_metadata

# Use the `options` parameter to pass the terrain metadata and initialize the environment
options = dict(
    metadata=metadata
)
env.reset(options)

Further details on the motivation of this change are described in the referenced issue below.

Fixes # 892

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting solution

Could you add tests and some general documentation on how to use this

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the changes, I don't understand why the class is required or half the changes.
Could you explain what the changes do and why they are implemented like this?

.gitignore Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these changes

@arthur-plautz
Copy link
Author

arthur-plautz commented Apr 1, 2025

Looking through the changes, I don't understand why the class is required or half the changes. Could you explain what the changes do and why they are implemented like this?

Yes, I'll include an overview, tests and a tutorial here soon. In short, the idea is to provide the possibility of designing the obstacle course, by choosing which obstacles at which point should be included in the terrain and also their characteristics.

Also, if the design choice for the solution is a problem, I can refactor it to suit a specific standard.

@pseudo-rnd-thoughts
Copy link
Member

Also, if the design choice for the solution is a problem, I can refactor it to suit a specific standard.

To me, the class seems unnecessary, or at least in its overly complex. We wish to have an easy-to-understand code base

@arthur-plautz
Copy link
Author

Also, if the design choice for the solution is a problem, I can refactor it to suit a specific standard.

To me, the class seems unnecessary, or at least in its overly complex. We wish to have an easy-to-understand code base

@pseudo-rnd-thoughts

I've removed the class and created two internal functions:

  • _process_terrain_metadata: Initializes the terrain metadata properties
  • _generate_terrain_state: From a given state, generates/copies the corresponding metadata (height, stairs length, size, etc.)

Looks simpler to me, let me know what you think.
I've also added an option to remove the fall down penalty, since it may have a negative impact when using evolutionary strategies, as stated in this paper (https://arxiv.org/abs/2205.07592).

Where can I include examples on how to use this?

Here's an example on how this could be used to design a specific terrain (or copy an existing terrain):

import gymnasium as gym

GRASS, STUMP, STAIRS, PIT, _STATES_ = range(5)
OBSTACLES = dict(
    down_stairs=dict(state=2, metadata=(-1, 4, 2)),
    up_stairs=dict(state=2, metadata=(1, 4, 2)),
    small_stump=dict(state=1, metadata=1),
    large_stump=dict(state=1, metadata=3),
    hole=dict(state=3, metadata=3),
)

env = gym.make("BipedalWalker-v3", hardcore=True, render_mode="human")

metadata = dict(
    states=[OBSTACLES["hole"], OBSTACLES["hole"], OBSTACLES["large_stump"]],
    x_variations=False,
    y_variations=True
)

# Use the `options` parameter to pass the terrain metadata and initialize the environment
options = dict(metadata=metadata)
env.reset()

# Execute for visualization
for _ in range(100):
    # this is where you would insert your policy
    action = env.action_space.sample()

    # step (transition) through the environment with the action
    # receiving the next observation, reward and if the episode has terminated or truncated
    observation, reward, terminated, truncated, info = env.step(action)

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arthur-plautz Your changes make way more sense, thanks for the changes.

Could you add documentation in the class docstring with a section on determining terrian with a note in the version section.
Then could you add some tests in tests/envs/test_env_implementations.py to check that the implementation works as expected.

@@ -281,7 +288,80 @@ def _destroy(self):
self.legs = []
self.joints = []

def _process_terrain_metadata(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring to explain what the function does

self.terrain_grass = TERRAIN_GRASS
self._terrain_metadata = dict(states=[])

def _generate_terrain_state(self, state: int) -> any:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring


if self._predefined_terrain:
if state == GRASS:
next_state = self._terrain_metadata["states"][0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why for GRASS do we only get the first while for the next state we pop the first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants