Rewrite PCL agent by lyx-x · Pull Request #245 · chainer/chainerrl

lyx-x · 2018-02-25T20:49:43Z

I rewrote the PCL agent to avoid memory issues when saving Variables inside list / replay buffer. I didn't compare the training curve with the old one, but it seems to learn (the average_value increases and R gets bigger) on Catpole under the new parameters and there is no memory issue when run with large network / reasonably long trajectories.

Main methods are the following:

update: take a loss (as an array), log the result as usual and call optimizer (the backprop is done before this function is called)
update_on_policy and update_from_replay: sample a list of trajectories (from replay or the current one), clear grads and compute loss
compute_loss: take a list of trajectories, perform batch computation (batch size is the number of episodes, which may not be efficient when there is one single episode for on-policy update). This function will call backward immediately and only return an array for logging
_compute_path_consistency: compute path consistency, this part of code is almost unchanged

The new underlying data structure is a list of dict to store the current episode, then a replay buffer that only stores (s,a,r) pairs. The old mu (action_distrib) is removed since it can be recomputed again from other items.

I also added a unified model in the example script and changed a couple of parameters.

Issues addressed: #109 #236 #240

I am not sure if the parameters are used correctly, but if they are correct, this PR also addresses #238

muupan · 2018-03-15T05:38:52Z

Thank you for the improvements on PCL. I haven't checked the implementation details yet, but I think solving the memory issue is great as long as it won't make training slow.

Can you show the training curves and computation speeds before and after this PR?

lyx-x and others added 15 commits February 14, 2018 15:37

Change docstring of t_max in PCL and add a new parameter max_len_replay

c8fe4f7

Rename parameters of PCL

b8c069f

Rewrite PCL for memory efficiency

5c81877

Clean up code of PCL agent

cf828b2

Add unified PCL in the example script

ab6cb4f

Use logsumexp to avoid overflow

1cd190d

Style check

f75fcae

Improve path consistency computation

a3035c4

Improve path consistency computation: minibatch

e0a15a2

Improve path consistency computation: minibatch

cd88fa3

Add comments, make the code compatible with the old version

a42e6b3

Change comment style

38dfbde

Fix flake error

c50c160

Fix flake error

9d7318d

Fix errors in the comments

3b4b6da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite PCL agent#245

Rewrite PCL agent#245
lyx-x wants to merge 15 commits intochainer:masterfrom
lyx-x:pcl

lyx-x commented Feb 25, 2018

Uh oh!

muupan commented Mar 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lyx-x commented Feb 25, 2018

Uh oh!

muupan commented Mar 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants