R2D2 not converging?

Hi, 

I am running your model on Pong and it doesn't seem like the R2D2 model is converging at all? In contrast, your Ape-X implementation works and starts converging nicely after 2-3 hours.

Here your R2D2 implementation results after training for 32 hours on an 1080 TI with 4 workers:

![image](https://user-images.githubusercontent.com/23219722/65055350-03b5f680-d93d-11e9-8514-1115d5711fc5.png)

Note there are various items in your implementation that are different from the papers for both Ape-X and R2D2, such as worker epsilons being below 0.4 and always constant (which has a significant impact on convergence speed) , or the DM R2D2 model taking as additional input the last action and last reward.

Did you manage to get any convergence yourself? If so, how can I replicate it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R2D2 not converging? #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

R2D2 not converging? #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions