This approach intends to make use of a prior strategy in order to unroll the game up to a certain point in time t, then let the exploration strategy being trained take over. t is then gradually reduced as the exploration strategy improves.
In order to generate game_state(t)s, we intend to perform the following steps:
- Choose a
prior strategy that can be configured to play deterministic or non-deterministic
- Play the non-deterministic version of
prior strategy either against itself or a deterministic version of itself up to time t
- Determine which player is favoured according to the deterministic
prior strategy
- Play the
exploration strategy against the deterministic prior strategy, playing as the favoured player in order to guarantee it has a chance of winning.
Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1
This approach intends to make use of a
prior strategyin order to unroll the game up to a certain point in timet, then let theexploration strategybeing trained take over.tis then gradually reduced as theexploration strategyimproves.In order to generate
game_state(t)s, we intend to perform the following steps:prior strategythat can be configured to play deterministic or non-deterministicprior strategyeither against itself or a deterministic version of itself up to timetprior strategyexploration strategyagainst the deterministicprior strategy, playing as the favoured player in order to guarantee it has a chance of winning.Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1