Inconsistent actions between train and inference on Mario

I trained policy for mario environment with
`python train.py --default --env-id mario --noReward`
And observed quite high external reward during the training:
```
[2019-03-15 01:24:17,798] True Game terminating: env_episode_reward=0.648666666667 episode_length=669
Episode finished. Sum of shaped rewards: 0.00. Length: 669. Bonus: 4.1677.
```

However, when I try to run the policy with inference.py with the following
`python inference.py --env-id SuperMarioBros-1-1-v0  --default --log-dir ../mario/train`
the agent continuously keeps trying to go left, which makes me think that the action space for the train and the inference is inconsistent (somehow swapped).

Is there a way to fix it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent actions between train and inference on Mario #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inconsistent actions between train and inference on Mario #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions