Migrate obs/reward normalization from env.wrappers into Agent itself

- Make online statistics as `nn.Parameter` and registered inside the module. It becomes trackable
  - Similar style with how the BatchNorm is implemented in PyTorch
- Different behavior between train/eval modes. 
  - Train mode: update statistics
  - Eval mode: use current statistics without further updating

- Unit test: before replacement, benchmark with the old behavior
  - Mujoco environments: HalfCheetah, Hopper, Walker
  - Seeds: 10 seeds
  - Confirm no significant effect on performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate obs/reward normalization from env.wrappers into Agent itself #205

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Migrate obs/reward normalization from env.wrappers into Agent itself #205

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions