Skip to content

Latest commit

 

History

History
102 lines (68 loc) · 4.32 KB

File metadata and controls

102 lines (68 loc) · 4.32 KB

Guide: OpenEnv + SkyRL

This directory holds the workflow to train on PyTorch OpenEnv environments with SkyRL.

In this guide, we walk through how to train a reinforcement learning agent using SkyRL with PyTorch OpenEnv environments. OpenEnv provides isolated execution environments for agentic RL training with Gymnasium-style APIs.

Start by following the SkyRL installation instructions, then enter the skyrl-train directory:

cd SkyRL/skyrl-train

1) Environment Setup

Prerequisites: Ensure that you have Docker installed and the required OpenEnv environment images pulled locally.

First, install the OpenEnv environments (i.e., download the images for each environment):

# Execute from skyrl-train directory
uv run integrations/openenv/install_environment.py echo-env
# Or install all environments:
# uv run integrations/openenv/install_environment.py

This will pull the necessary Docker images for the OpenEnv environments.

Available environments: echo-env, coding-env, openspiel-env, atari-env, sumo-rl-env, finrl-env.

2) Dataset Preparation

For training, we use simple example datasets generated by the prepare_dummy_dataset.py script:

# Execute from skyrl-train directory
uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv --env_name echo_env
# Or generate datasets for all environments:
# uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv 

This creates training and validation datasets with example prompts for the specified environment (we provide two examples in echo_env and coding_env)

prepare_dummy_dataset.py has additional optional parameters:

  • --output_dir: directory to place datasets (default: ~/data/openenv)
  • --env_name: specific environment to prepare dataset for (default: all environments)

Notes on dataset generation:

  • This script will generate the following Parquet files under output_dir:
    • train.parquet
    • validation.parquet
  • For issues in loading the dataset, see the Troubleshooting section below.

3) Training

We provide an example training script for Qwen2.5-0.5B-Instruct on OpenEnv environments:

# Execute from skyrl-train directory
bash integrations/openenv/run_openenv.sh

Currently, the supporting environments are: echo_env, coding_env, openspiel-env, atari-env, sumo-rl-env, finrl-env. You can customize the training by setting environment variables:

ENV_NAME=coding_env NUM_GPUS=2 bash integrations/openenv/run_openenv.sh

Or modify the commonly-edited training settings in run_openenv.sh as needed:

ENV_NAME="coding_env"
DATA_DIR="$HOME/data/openenv/$ENV_NAME"
NUM_GPUS=4
LOGGER="wandb"

All training parameters can be modified in run_openenv.sh, such as the model choice (trainer.policy.model.path), GRPO group size (generator.n_samples_per_prompt), or training batch size (trainer.train_batch_size).

See all available training configuration parameters in ppo_base_config.yaml.

Tips

  • Docker Resources: Ensure sufficient Docker resources are available, especially for computationally intensive environments like Atari or OpenSpiel.
  • Generation Format: The generation format right now is expected to be a single action wrapped in <action>...</action> tags for dummy testing. Change _get_openenv_action in the OpenEnv environment wrapper (integrations/openenv/env.py) for custom parsing logic.
  • Environment Variables: You can override default values with environment variables like NUM_GPUS=1, ENV_NAME=coding_env, MAX_TURNS=1 etc.
  • Logging: Set LOGGER=console to print logs to stdout instead of using wandb.

Troubleshooting

For issues with SkyRL or the integration with OpenEnv, please open an Issue.

Datasets

We use dummy datasets for all the environment integration now. Please modify prepare_dummy_dataset.py as needed to extract and prepare the correct datasets.

TODOs and Limitations

We welcome any contributions to help resolve the remaining tasks!

  • Make it easier to specify different OpenEnv environments used for training and validation.
  • Make it smoother to specify which dataset splits to use