Support external control of stochastic transitions via stochastic_step and action probabilities

Introduce additional API control for stochastic steps while providing backwards compatibility.  Actions that would normally introduce stochastic actions automatically will continue to do so. I.e. after a player's turn, roll a dice.  Stochastic step would override the automatic random number with a specific action.  This is useful for MCTS trees where the stochastic action space and probabilities are known.   An external MCTS tree can explore the stochastic action space as well as the deterministic action space.  This would be for 2048, backgammon, bridge/ card games, etc. 

1. New `State` Attribute: `is_stochastic`
   * Purpose: Flags a state as having resulted from a stochastic transition that can be overridden.
   * Behavior: When step() (or init()) executes a move, that includes an automatic random step after the action this will be True.

  2. New `Env` Attribute: `stochastic_action_probs` (or function if wanting to support more games)
   * Purpose: Defines the complete probability distribution of possible stochastic outcomes.
   * Structure: A static JAX array representing the likelihood of each "stochastic action", 
   * Question: most games this is static for the whole game, but maybe this is dynamic per state, but that is more complicated.

  3. New `Env` Method: `stochastic_step(state, action)`
   * Purpose: Allows users to override the internal random outcome of the previous step with a specific random event.
   * Mechanism: It effectively "rewinds" the random part of the transition and applies the specific chance event corresponding to the provided action index.
   

I wanted to discuss if this is something we could add, and excited to get feedback for the idea.   I can add PR's for adding these changes to backgammon and 2048.

Questions, Alternatives, feedback:
 - Rather than providing backwards compatibility we could explicitly make the change which would be more efficient from a coding perspective but would break, 2048, backgammon, bridge backwards compatibility.
 - Assumes a single action probability for the entire game which may not be flexible enough for bridge / card games where probability changes after each stochastic state change, but maybe we add a mask like legal_stochastic_action ? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support external control of stochastic transitions via stochastic_step and action probabilities #1308

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support external control of stochastic transitions via stochastic_step and action probabilities #1308

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions