batch update (figure 6.2)

the notebook doesn't currently reproduce figure 6.2 which uses batch updating (replays all episodes in an experience buffer until convergence).  as far as i know `RL.jl` doesn't currently support this out the gate since episode info isn't saved, but looking at `RLTrajectories.jl`, it should make this task a ton easier.