[WIP] Implements Hindsight Experience Replay#361
[WIP] Implements Hindsight Experience Replay#361prabhatnagarajan wants to merge 62 commits intochainer:masterfrom
Conversation
|
Here are a couple of differences from the original paper I noticed:
Please verify/be advised of the following:
|
chainerrl/agents/ddpg.py
Outdated
| if self.obs_normalizer: | ||
| batch['state'] = self.obs_normalizer(batch['state'], | ||
| update=False) | ||
| batch['next_state'] = self.obs_normalizer(batch['state'], |
There was a problem hiding this comment.
Shouldn't this be
| batch['next_state'] = self.obs_normalizer(batch['state'], | |
| batch['next_state'] = self.obs_normalizer(batch['next_state'], |
chainerrl/agents/ddpg.py
Outdated
| loss = - F.sum(q) / batch_size | ||
| if self.l2_action_penalty: | ||
| loss += self.l2_action_penalty \ | ||
| * F.square(onpolicy_actions) / batch_size |
There was a problem hiding this comment.
Should this also include a F.sum term around the F.square?




No description provided.