Skip to content

R2D2 not converging? #2

Open
Open
@NikEyX

Description

@NikEyX

Hi,

I am running your model on Pong and it doesn't seem like the R2D2 model is converging at all? In contrast, your Ape-X implementation works and starts converging nicely after 2-3 hours.

Here your R2D2 implementation results after training for 32 hours on an 1080 TI with 4 workers:

image

Note there are various items in your implementation that are different from the papers for both Ape-X and R2D2, such as worker epsilons being below 0.4 and always constant (which has a significant impact on convergence speed) , or the DM R2D2 model taking as additional input the last action and last reward.

Did you manage to get any convergence yourself? If so, how can I replicate it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions