fix(cfr): include remaining stack in board enumeration rewards and guard sequential pruning#275
Merged
elliottneilclark merged 1 commit intomasterfrom Apr 17, 2026
Merged
Conversation
…ard sequential pruning The board enumeration reward functions computed `pot_share - starting_stack` but omitted the player's remaining stack after betting. After fast_forward_advance_betting moves chips from stacks into the pot, the correct net reward is `remaining_stack + pot_share - starting_stack`. Also adds the `indexed_actions.len() > 2` guard to the sequential pruning path so it matches the parallel path — without it, 2-action nodes could have one action pruned on 75% of iterations, collapsing to a fixed policy.
087738b to
ac56557
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The board enumeration reward functions computed
pot_share - starting_stackbut omitted the player's remaining stack after betting. After
fast_forward_advance_betting moves chips from stacks into the pot, the
correct net reward is
remaining_stack + pot_share - starting_stack.Also adds the
indexed_actions.len() > 2guard to the sequential pruningpath so it matches the parallel path — without it, 2-action nodes could
have one action pruned on 75% of iterations, collapsing to a fixed policy.