You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf(cfr): single shared tree, compact action space, regret-based pruning, and board enumeration (#274)
Implement four complementary optimizations from the CFR literature that
together reduce memory usage by ~80% and improve convergence quality.
Previously each player maintained an independent CFR tree (`Vec<CFRState>`).
Since the traversal records all information (no information hiding), every
player's tree was structurally identical — duplicating all nodes and regret
data. This commit replaces the per-player vec with a single `CFRState`
shared by all players. For a 2-player game this halves the NodeArena
memory. The historian now creates nodes once and moves all traversal
states together, and sub-agent construction no longer clones a vec of
Arc-backed states.
The action index mapper allocated 52 slots per regret matcher (fold,
call, 49 raise buckets, all-in) but typical action generators produce
only 4-8 distinct bet sizes. The 49 raise slots provided ~2% log-space
resolution — far finer than needed. Reducing to 12 raise slots (indices
2-13) plus fold, call, all-in, and one reserved slot gives 16 total
indices with ~38% resolution, still sufficient to distinguish standard
pot-fraction bets (0.33x, 0.67x, 1.0x, 1.5x). Each regret matcher
stores a weight vector sized to NUM_ACTION_INDICES, so this change
reduces per-node regret storage by ~69%.
After a warmup period (3 updates), actions whose cumulative regret has
been driven to zero by PCFR+ clamping are skipped during reward
computation. Computing rewards requires expensive recursive
sub-simulations, so skipping dead actions saves significant wall-clock
time. Every 4th iteration all actions are reprobed to detect actions
that have become relevant again. The pruning integrates into both the
sequential and parallel (rayon) reward computation paths.
When computing fast-forward rewards at leaf depth, the previous approach
sampled a single random board completion. This introduces variance into
the reward signal, which slows CFR convergence. The new approach depends
on how many community cards remain:
- 0 cards (showdown): 1 evaluation — deterministic.
- 1 card (river only): ~46 evaluations — full enumeration.
- 2 cards (turn + river): ~C(46,2) ≈ 1035 evaluations — full enumeration.
- 3 cards (flop): sample k=3 random flops, then enumerate all turn+river
combinations for each (~946 evals per flop, ~2838 total). This hybrid
approach gives 3.8x variance reduction vs single-sample at only 2x the
cost of k=1. Testing k=1,2,3,5,8,13 showed diminishing returns beyond
k=3 (each additional sample adds <0.5x reduction while doubling cost).
The 0-2 card cases produce zero-variance reward signals. The 3-card
hybrid approach substantially reduces variance from the flop while
keeping cost tractable for the CFR inner loop. Related to the AIVAT
variance reduction technique (Burch et al., AAAI 2018).
- `CFRAgentBuilder::cfr_states(Vec<CFRState>)` → `cfr_state(CFRState)`
- `HoldemSimulationBuilder::cfr_context_arc()` removed; `cfr_context()`
now takes a single `CFRState` instead of `Vec<CFRState>`
- `CFRHistorian::new()` takes `&CFRState` instead of `&Arc<[CFRState]>`
- `ConfigAgentBuilder::cfr_context()` takes `CFRState` instead of vec
- `test_rbp_preserves_fold_decision`: verifies pruning still produces
the correct fold for K-high facing an all-in on a paired board.
- `test_rbp_reduces_active_actions`: verifies the pruning bitset becomes
sparse after warmup, confirming RBP activates and prunes dead actions.
- `test_flop_sample_variance_vs_single_sample`: confirms 3.8x variance
reduction with k=3 flop samples vs single random runout.
- `test_flop_sample_dominated_hand`: AKs vs 72o produces correct
positive EV (~160) matching theoretical 66% equity.
- `test_flop_sample_count_comparison`: parameterized comparison across
k=1,2,3,5,8,13 showing cost/variance tradeoff (k=3 is the knee).
- All existing CFR tests updated for the single-state API.
0 commit comments