Some questions about "train_baseline.py"

I run `train_baseline.py`, and after some iterations, I got information like this:
![屏幕截图 2020-09-21 155836](https://user-images.githubusercontent.com/35218694/93743800-47fc6680-fc23-11ea-9907-c15363436540.png)

The `policy_reward_mean` always  equals `0`. I do not know whether this result is correct.