-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Introduce additional API control for stochastic steps while providing backwards compatibility. Actions that would normally introduce stochastic actions automatically will continue to do so. I.e. after a player's turn, roll a dice. Stochastic step would override the automatic random number with a specific action. This is useful for MCTS trees where the stochastic action space and probabilities are known. An external MCTS tree can explore the stochastic action space as well as the deterministic action space. This would be for 2048, backgammon, bridge/ card games, etc.
-
New
StateAttribute:is_stochastic- Purpose: Flags a state as having resulted from a stochastic transition that can be overridden.
- Behavior: When step() (or init()) executes a move, that includes an automatic random step after the action this will be True.
-
New
EnvAttribute:stochastic_action_probs(or function if wanting to support more games)
- Purpose: Defines the complete probability distribution of possible stochastic outcomes.
- Structure: A static JAX array representing the likelihood of each "stochastic action",
- Question: most games this is static for the whole game, but maybe this is dynamic per state, but that is more complicated.
- New
EnvMethod:stochastic_step(state, action)
- Purpose: Allows users to override the internal random outcome of the previous step with a specific random event.
- Mechanism: It effectively "rewinds" the random part of the transition and applies the specific chance event corresponding to the provided action index.
I wanted to discuss if this is something we could add, and excited to get feedback for the idea. I can add PR's for adding these changes to backgammon and 2048.
Questions, Alternatives, feedback:
- Rather than providing backwards compatibility we could explicitly make the change which would be more efficient from a coding perspective but would break, 2048, backgammon, bridge backwards compatibility.
- Assumes a single action probability for the entire game which may not be flexible enough for bridge / card games where probability changes after each stochastic state change, but maybe we add a mask like legal_stochastic_action ?