Skip to content

[Question] Results vastly different for an agent created with Stable Baselines3 using hyperparameters optimized in RL Baselines3 Zoo. #458

Open
@mzelazko

Description

@mzelazko

❓ Question

Hello,
I first optimize A2C on 1mln steps using RL Baselines3 Zoo:

Firstly i have changed a2c.yml in RL Baselines3 Zoo to work with RAM version of Seaquest:

atari:
  policy: 'MlpPolicy'
  n_envs: 16
  policy_kwargs: "dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5))"

Then wrote command:

python -m train --algo a2c --env ALE/Seaquest-ram-v5 -n 1000000 -optimize --n-trials 100 --n-startup-trials 10
--sampler tpe --pruner median  --n-evaluations 4 --n-eval-envs 16 --storage "some_valid_database" --study-name test

Top 3 results:
mysqlsh_ugrHTRYZRL
Then using for example these hyperparameters:
mysqlsh_4UsJxMM74z
and using this code:

def linear_decay_lr(progress_remaining):
    return 0.00027232300584036946 * progress_remaining
if __name__ == "__main__":
    vec_env = make_vec_env("ALE/Seaquest-ram-v5", n_envs=16)
    model = A2C(
        "MlpPolicy",
        vec_env,
        learning_rate=linear_decay_lr,
        n_steps=256,
        gamma=0.999,
        gae_lambda=0.98,
        ent_coef=0.00001753537605091099,
        vf_coef=0.19195701505334234,
        max_grad_norm=0.5,
        use_rms_prop=True,
        normalize_advantage=False,
        verbose=1,
        tensorboard_log="./seaquest/107",
        policy_kwargs=dict(activation_fn=torch.nn.Tanh, net_arch=dict(pi=[256, 256], vf=[256, 256]), ortho_init=True,
                                      optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5))
    )
    model.learn(total_timesteps=1000000, log_interval=1)

I get results:
firefox_ETT9C6jsTI

As picture shows, result is long way from 456 that RL Baselines Zoo got to. I have used more hyperparameters, but scores are always much lower.
What I'm aware of that can have impact on this issue is seed, as I didn't pick the same. Nevertheless I have tried many instances of A2C and the problem remains.

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions