Implementing AlphaZero RUST agent by adamantivm · Pull Request #346 · jonbinney/deep_rabbit_hole

adamantivm · 2026-02-25T14:13:20Z

Rust AlphaZero MCTS Agent

Ports the Python AlphaZero agent (MCTS + neural network evaluation) to Rust, behind the binary feature flag.

Changes

GameState wrapper — New struct in game_state.rs bundling grid, positions, walls, and current player. Simplifies the ActionSelector trait from 6 parameters to (&GameState, &[bool]). All existing agents and play_game updated accordingly.

MCTS engine — Arena-based tree implementation in agents/alphazero/mcts.rs avoiding Rust lifetime complexity. Includes:

PUCT selection with configurable exploration constant
Lazy node expansion (game state cloned only when visited)
Dirichlet noise at root for exploration
Visited-state penalty to discourage loops
Win/loss backpropagation with alternating signs

OnnxEvaluator — Wraps ONNX inference, applies action masking, returns (value, priors) for MCTS.

AlphaZeroAgent — Implements ActionSelector with temperature-based action sampling and automatic temperature drop after N steps.

Config & CLI — Added AlphaZeroConfig to YAML parsing (keys match Python format). New --agent-type alphazero flag in selfplay binary.

Usage

selfplay  --config experiments/ci.yaml \
         --model-path model.onnx \
         --num-games 100

- Add rand_distr to Cargo.toml as optional dependency behind binary feature - Create GameState struct in game_state.rs with new, step, clone_and_step, is_game_over, check_win, get_action_mask, get_fast_hash, policy_size methods - Update ActionSelector trait signature to take &GameState instead of individual params - Update RandomAgent and OnnxAgent to use new signature - Update play_game in game_runner.rs to use GameState - Add stub files for alphazero module (evaluator, mcts, agent) - All 99 existing tests pass

- Create Evaluator trait with evaluate() method for NN evaluation - Create OnnxEvaluator implementing Evaluator trait - Create arena-based MCTS implementation with: - NodeArena wrapping Vec<Node> with indices - Node struct with lazy game state expansion - PUCT selection with UCB formula - Backpropagation with value alternation - Dirichlet noise at root - Visited state penalty for loop avoidance - Add MCTSConfig and ChildInfo structs - Add comprehensive tests for evaluator, node operations, and MCTS search - All 111 tests pass

- Create AlphaZeroAgent in agents/alphazero/agent.rs with: - MCTS search integration with OnnxEvaluator - Temperature-based action sampling (greedy or proportional) - drop_t_on_step for switching to greedy after N steps - Visited state tracking for loop penalty - reset_game() method for between-game cleanup - Add AlphaZeroConfig to selfplay_config.rs: - YAML keys match Python format for config reusability - Supports mcts_n, mcts_k, mcts_c_puct, noise settings - merge() method for self_play section overrides - Update selfplay.rs: - Add --agent-type CLI arg (onnx, alphazero, random) - BoxedAgent enum for dynamic dispatch - Reset AlphaZero agents between games - Print MCTS config when using alphazero - Add tests for temperature sampling and config parsing - All 118 tests pass, binary compiles

Future improvements documented: - Batch search for better GPU utilization - NN evaluation caching (LRU cache by state hash) - Undo-action tree traversal for efficiency - Transposition table for recognizing same states - Root parallelization for multi-core CPUs - Virtual loss for multi-threaded MCTS - Progressive widening - Various MCTS-specific optimizations (FPU, RAVE, etc.)

deep_quoridor/rust/src/agents/alphazero/evaluator.rs

deep_quoridor/rust/src/game_state.rs

deep_quoridor/rust/src/agents/alphazero/evaluator.rs

deep_quoridor/rust/src/agents/alphazero/mcts.rs

deep_quoridor/rust/src/bin/selfplay.rs

…ator

…is now default

…it lifetimes to GameState accessors

adamantivm · 2026-02-25T19:03:47Z

This is ready for review.

alejandromarcu

Can't really understand it all but kind of makes sense 🤷 .
Please check the comment.
Also, I'm curious about how fast it runs and if the games now end in wins rather than draws, fi you have results please post them

deep_quoridor/rust/src/agents/alphazero/mcts.rs

…tes flag

adamantivm · 2026-02-27T13:41:11Z

@jonbinney are you OK with me merging this?

jonbinney · 2026-02-27T17:45:38Z

Yep, LGTM!

Julian Cerruti added 9 commits February 25, 2026 09:58

assisted: save plan

e691897

manual: allow git commit

095fe4c

Merge branch 'copilot-worktree-2026-02-25T12-58-33' into jac/rust-mcts

00fd7ae

manual: rust formatting

b1ae1c0

manual: updated agent rules

977dd93