This directory holds the workflow to train on PyTorch OpenEnv environments with SkyRL.
In this guide, we walk through how to train a reinforcement learning agent using SkyRL with PyTorch OpenEnv environments. OpenEnv provides isolated execution environments for agentic RL training with Gymnasium-style APIs.
Start by following the SkyRL installation instructions, then enter the skyrl-train directory:
cd SkyRL/skyrl-trainPrerequisites: Ensure that you have Docker installed and the required OpenEnv environment images pulled locally.
First, install the OpenEnv environments (i.e., download the images for each environment):
# Execute from skyrl-train directory
uv run integrations/openenv/install_environment.py echo-env
# Or install all environments:
# uv run integrations/openenv/install_environment.pyThis will pull the necessary Docker images for the OpenEnv environments.
Available environments: echo-env, coding-env, openspiel-env, atari-env, sumo-rl-env, finrl-env.
For training, we use simple example datasets generated by the prepare_dummy_dataset.py script:
# Execute from skyrl-train directory
uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv --env_name echo_env
# Or generate datasets for all environments:
# uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv This creates training and validation datasets with example prompts for the specified environment (we provide two examples in echo_env and coding_env)
prepare_dummy_dataset.py has additional optional parameters:
--output_dir: directory to place datasets (default:~/data/openenv)--env_name: specific environment to prepare dataset for (default: all environments)
Notes on dataset generation:
- This script will generate the following Parquet files under
output_dir:train.parquetvalidation.parquet
- For issues in loading the dataset, see the Troubleshooting section below.
We provide an example training script for Qwen2.5-0.5B-Instruct on OpenEnv environments:
# Execute from skyrl-train directory
bash integrations/openenv/run_openenv.shCurrently, the supporting environments are: echo_env, coding_env, openspiel-env, atari-env, sumo-rl-env, finrl-env.
You can customize the training by setting environment variables:
ENV_NAME=coding_env NUM_GPUS=2 bash integrations/openenv/run_openenv.shOr modify the commonly-edited training settings in run_openenv.sh as needed:
ENV_NAME="coding_env"
DATA_DIR="$HOME/data/openenv/$ENV_NAME"
NUM_GPUS=4
LOGGER="wandb"All training parameters can be modified in run_openenv.sh, such as the model choice (trainer.policy.model.path), GRPO group size (generator.n_samples_per_prompt), or training batch size (trainer.train_batch_size).
See all available training configuration parameters in ppo_base_config.yaml.
- Docker Resources: Ensure sufficient Docker resources are available, especially for computationally intensive environments like Atari or OpenSpiel.
- Generation Format: The generation format right now is expected to be a single action wrapped in
<action>...</action>tags for dummy testing. Change_get_openenv_actionin the OpenEnv environment wrapper (integrations/openenv/env.py) for custom parsing logic. - Environment Variables: You can override default values with environment variables like
NUM_GPUS=1,ENV_NAME=coding_env,MAX_TURNS=1etc. - Logging: Set
LOGGER=consoleto print logs to stdout instead of using wandb.
For issues with SkyRL or the integration with OpenEnv, please open an Issue.
We use dummy datasets for all the environment integration now. Please modify prepare_dummy_dataset.py as needed to extract and prepare the correct datasets.
We welcome any contributions to help resolve the remaining tasks!
- Make it easier to specify different OpenEnv environments used for training and validation.
- Make it smoother to specify which dataset splits to use