Skip to content

Support external control of stochastic transitions via stochastic_step and action probabilities #1308

@sile16

Description

@sile16

Introduce additional API control for stochastic steps while providing backwards compatibility. Actions that would normally introduce stochastic actions automatically will continue to do so. I.e. after a player's turn, roll a dice. Stochastic step would override the automatic random number with a specific action. This is useful for MCTS trees where the stochastic action space and probabilities are known. An external MCTS tree can explore the stochastic action space as well as the deterministic action space. This would be for 2048, backgammon, bridge/ card games, etc.

  1. New State Attribute: is_stochastic

    • Purpose: Flags a state as having resulted from a stochastic transition that can be overridden.
    • Behavior: When step() (or init()) executes a move, that includes an automatic random step after the action this will be True.
  2. New Env Attribute: stochastic_action_probs (or function if wanting to support more games)

  • Purpose: Defines the complete probability distribution of possible stochastic outcomes.
  • Structure: A static JAX array representing the likelihood of each "stochastic action",
  • Question: most games this is static for the whole game, but maybe this is dynamic per state, but that is more complicated.
  1. New Env Method: stochastic_step(state, action)
  • Purpose: Allows users to override the internal random outcome of the previous step with a specific random event.
  • Mechanism: It effectively "rewinds" the random part of the transition and applies the specific chance event corresponding to the provided action index.

I wanted to discuss if this is something we could add, and excited to get feedback for the idea. I can add PR's for adding these changes to backgammon and 2048.

Questions, Alternatives, feedback:

  • Rather than providing backwards compatibility we could explicitly make the change which would be more efficient from a coding perspective but would break, 2048, backgammon, bridge backwards compatibility.
  • Assumes a single action probability for the entire game which may not be flexible enough for bridge / card games where probability changes after each stochastic state change, but maybe we add a mask like legal_stochastic_action ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions