Skip to content

[Question] exp_manager reward and GAE discount factors #442

Open
@edmund735

Description

@edmund735

❓ Question

Hi,

I noticed that the algorithm discount factor and reward discount factor are set to be the same in lines 365-367 in rl_zoo3/exp_manager.py

# Use the same discount factor as for the algorithm. if "gamma" in hyperparams: self.normalize_kwargs["gamma"] = hyperparams["gamma"]

For PPO, does this mean the discount factor used for GAE is the same as the reward discount factor? I'm currently training an environment with PPO that is episodic (episodes always reach a termination state by 100 time steps, so there is never truncation) and I'd like to have a reward discount factor of 1. In this case, if I want to do hyperparameter tuning for the GAE discount factor, should I remove this line so that the VecNormalize object created uses a discount factor of 1 (different from the GAE discount factor)? Also, why are the discount factors matched only if normalize =True (isn't it possible that you still have a reward discount factor without normalization)? I read #64 ("gamma is the only one we override automatically for correctness (and only if present in the hyperparameters)" and don't think I understand what "correctness" means in this case. Any further explanation would be very helpful.

Thanks!

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions