Skip to content

Migrate obs/reward normalization from env.wrappers into Agent itself #205

@zuoxingdong

Description

@zuoxingdong
  • Make online statistics as nn.Parameter and registered inside the module. It becomes trackable

    • Similar style with how the BatchNorm is implemented in PyTorch
  • Different behavior between train/eval modes.

    • Train mode: update statistics
    • Eval mode: use current statistics without further updating
  • Unit test: before replacement, benchmark with the old behavior

    • Mujoco environments: HalfCheetah, Hopper, Walker
    • Seeds: 10 seeds
    • Confirm no significant effect on performance

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions