Implementing AlphaZero RUST agent#346
Merged
adamantivm merged 16 commits intojonbinney:mainfrom Feb 27, 2026
Merged
Conversation
added 9 commits
February 25, 2026 09:58
- Add rand_distr to Cargo.toml as optional dependency behind binary feature - Create GameState struct in game_state.rs with new, step, clone_and_step, is_game_over, check_win, get_action_mask, get_fast_hash, policy_size methods - Update ActionSelector trait signature to take &GameState instead of individual params - Update RandomAgent and OnnxAgent to use new signature - Update play_game in game_runner.rs to use GameState - Add stub files for alphazero module (evaluator, mcts, agent) - All 99 existing tests pass
- Create Evaluator trait with evaluate() method for NN evaluation - Create OnnxEvaluator implementing Evaluator trait - Create arena-based MCTS implementation with: - NodeArena wrapping Vec<Node> with indices - Node struct with lazy game state expansion - PUCT selection with UCB formula - Backpropagation with value alternation - Dirichlet noise at root - Visited state penalty for loop avoidance - Add MCTSConfig and ChildInfo structs - Add comprehensive tests for evaluator, node operations, and MCTS search - All 111 tests pass
- Create AlphaZeroAgent in agents/alphazero/agent.rs with: - MCTS search integration with OnnxEvaluator - Temperature-based action sampling (greedy or proportional) - drop_t_on_step for switching to greedy after N steps - Visited state tracking for loop penalty - reset_game() method for between-game cleanup - Add AlphaZeroConfig to selfplay_config.rs: - YAML keys match Python format for config reusability - Supports mcts_n, mcts_k, mcts_c_puct, noise settings - merge() method for self_play section overrides - Update selfplay.rs: - Add --agent-type CLI arg (onnx, alphazero, random) - BoxedAgent enum for dynamic dispatch - Reset AlphaZero agents between games - Print MCTS config when using alphazero - Add tests for temperature sampling and config parsing - All 118 tests pass, binary compiles
Future improvements documented: - Batch search for better GPU utilization - NN evaluation caching (LRU cache by state hash) - Undo-action tree traversal for efficiency - Transposition table for recognizing same states - Root parallelization for multi-core CPUs - Virtual loss for multi-threaded MCTS - Progressive widening - Various MCTS-specific optimizations (FPU, RAVE, etc.)
adamantivm
commented
Feb 25, 2026
adamantivm
commented
Feb 25, 2026
adamantivm
commented
Feb 25, 2026
adamantivm
commented
Feb 25, 2026
adamantivm
commented
Feb 25, 2026
adamantivm
commented
Feb 25, 2026
added 6 commits
February 25, 2026 14:22
…it lifetimes to GameState accessors
Collaborator
Author
|
This is ready for review. |
alejandromarcu
approved these changes
Feb 26, 2026
Collaborator
alejandromarcu
left a comment
There was a problem hiding this comment.
Can't really understand it all but kind of makes sense 🤷 .
Please check the comment.
Also, I'm curious about how fast it runs and if the games now end in wins rather than draws, fi you have results please post them
Collaborator
Author
|
@jonbinney are you OK with me merging this? |
Owner
|
Yep, LGTM! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rust AlphaZero MCTS Agent
Ports the Python AlphaZero agent (MCTS + neural network evaluation) to Rust, behind the
binaryfeature flag.Changes
GameStatewrapper — New struct ingame_state.rsbundling grid, positions, walls, and current player. Simplifies theActionSelectortrait from 6 parameters to(&GameState, &[bool]). All existing agents andplay_gameupdated accordingly.MCTS engine — Arena-based tree implementation in
agents/alphazero/mcts.rsavoiding Rust lifetime complexity. Includes:OnnxEvaluator— Wraps ONNX inference, applies action masking, returns(value, priors)for MCTS.AlphaZeroAgent— ImplementsActionSelectorwith temperature-based action sampling and automatic temperature drop after N steps.Config & CLI — Added
AlphaZeroConfigto YAML parsing (keys match Python format). New--agent-type alphazeroflag in selfplay binary.Usage
selfplay --config experiments/ci.yaml \ --model-path model.onnx \ --num-games 100