Benchmark and replicate algorithm performance

Tune hyperparameters / match implementation details / fix bugs until we replicate the performance of reference implementations of algorithms. I'm not concerned about an exact match -- if we do about as well on average but better and worse depending on environments this seems OK.

Concretely, should test BC, AIRL, GAIL, DRLHP, DAgger on at least the seals versions of CartPole, MountainCar, HalfCheetah, Hopper.

Baselines: paper results as first port of call. But some paper results are confounded by different environment version, especially fixed vs variable horizon. SB2 GAIL is a good sanity check. If need be reference implementations of most other algorithms exist, but can be hard to run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark and replicate algorithm performance #388

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark and replicate algorithm performance #388

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions