Implement JSRL like training strategy

This approach intends to make use of a `prior strategy` in order to unroll the game up to a certain point in time `t`, then let the `exploration strategy` being trained take over. `t` is then gradually reduced as the `exploration strategy` improves.

In order to generate `game_state(t)`s, we intend to perform the following steps:

* Choose a `prior strategy` that can be configured to play deterministic or non-deterministic
* Play the non-deterministic version of `prior strategy` either against itself or a deterministic version of itself up to time `t`
* Determine which player is favoured according to the deterministic `prior strategy`
* Play the `exploration strategy` against the deterministic `prior strategy`, playing as the favoured player in order to guarantee it has a chance of winning.

Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement JSRL like training strategy #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement JSRL like training strategy #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions