Skip to content

[Question] hyperparameter optimization: objective of optuna study  #469

Open
@bias-ster

Description

@bias-ster

❓ Question

Hi,

I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.

In exp_manager.py, on line 810, I noticed that the optimization objective is defined using:
reward = eval_callback.last_mean_reward

This means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using:
'reward = eval_callback.best_mean_reward'
instead?

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already existsquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions