Skip to content

Implementing AlphaZero RUST agent#346

Merged
adamantivm merged 16 commits intojonbinney:mainfrom
adamantivm:jac/rust-mcts
Feb 27, 2026
Merged

Implementing AlphaZero RUST agent#346
adamantivm merged 16 commits intojonbinney:mainfrom
adamantivm:jac/rust-mcts

Conversation

@adamantivm
Copy link
Collaborator

@adamantivm adamantivm commented Feb 25, 2026

Rust AlphaZero MCTS Agent

Ports the Python AlphaZero agent (MCTS + neural network evaluation) to Rust, behind the binary feature flag.

Changes

GameState wrapper — New struct in game_state.rs bundling grid, positions, walls, and current player. Simplifies the ActionSelector trait from 6 parameters to (&GameState, &[bool]). All existing agents and play_game updated accordingly.

MCTS engine — Arena-based tree implementation in agents/alphazero/mcts.rs avoiding Rust lifetime complexity. Includes:

  • PUCT selection with configurable exploration constant
  • Lazy node expansion (game state cloned only when visited)
  • Dirichlet noise at root for exploration
  • Visited-state penalty to discourage loops
  • Win/loss backpropagation with alternating signs

OnnxEvaluator — Wraps ONNX inference, applies action masking, returns (value, priors) for MCTS.

AlphaZeroAgent — Implements ActionSelector with temperature-based action sampling and automatic temperature drop after N steps.

Config & CLI — Added AlphaZeroConfig to YAML parsing (keys match Python format). New --agent-type alphazero flag in selfplay binary.

Usage

selfplay  --config experiments/ci.yaml \
         --model-path model.onnx \
         --num-games 100

Julian Cerruti added 9 commits February 25, 2026 09:58
- Add rand_distr to Cargo.toml as optional dependency behind binary feature
- Create GameState struct in game_state.rs with new, step, clone_and_step,
  is_game_over, check_win, get_action_mask, get_fast_hash, policy_size methods
- Update ActionSelector trait signature to take &GameState instead of individual params
- Update RandomAgent and OnnxAgent to use new signature
- Update play_game in game_runner.rs to use GameState
- Add stub files for alphazero module (evaluator, mcts, agent)
- All 99 existing tests pass
- Create Evaluator trait with evaluate() method for NN evaluation
- Create OnnxEvaluator implementing Evaluator trait
- Create arena-based MCTS implementation with:
  - NodeArena wrapping Vec<Node> with indices
  - Node struct with lazy game state expansion
  - PUCT selection with UCB formula
  - Backpropagation with value alternation
  - Dirichlet noise at root
  - Visited state penalty for loop avoidance
- Add MCTSConfig and ChildInfo structs
- Add comprehensive tests for evaluator, node operations, and MCTS search
- All 111 tests pass
- Create AlphaZeroAgent in agents/alphazero/agent.rs with:
  - MCTS search integration with OnnxEvaluator
  - Temperature-based action sampling (greedy or proportional)
  - drop_t_on_step for switching to greedy after N steps
  - Visited state tracking for loop penalty
  - reset_game() method for between-game cleanup
- Add AlphaZeroConfig to selfplay_config.rs:
  - YAML keys match Python format for config reusability
  - Supports mcts_n, mcts_k, mcts_c_puct, noise settings
  - merge() method for self_play section overrides
- Update selfplay.rs:
  - Add --agent-type CLI arg (onnx, alphazero, random)
  - BoxedAgent enum for dynamic dispatch
  - Reset AlphaZero agents between games
  - Print MCTS config when using alphazero
- Add tests for temperature sampling and config parsing
- All 118 tests pass, binary compiles
Future improvements documented:
- Batch search for better GPU utilization
- NN evaluation caching (LRU cache by state hash)
- Undo-action tree traversal for efficiency
- Transposition table for recognizing same states
- Root parallelization for multi-core CPUs
- Virtual loss for multi-threaded MCTS
- Progressive widening
- Various MCTS-specific optimizations (FPU, RAVE, etc.)
@adamantivm adamantivm changed the title DRAFT: Implementing AlphaZero RUST agent Implementing AlphaZero RUST agent Feb 25, 2026
@adamantivm
Copy link
Collaborator Author

This is ready for review.

Copy link
Collaborator

@alejandromarcu alejandromarcu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't really understand it all but kind of makes sense 🤷 .
Please check the comment.
Also, I'm curious about how fast it runs and if the games now end in wins rather than draws, fi you have results please post them

@adamantivm
Copy link
Collaborator Author

@jonbinney are you OK with me merging this?

@jonbinney
Copy link
Owner

Yep, LGTM!

@adamantivm adamantivm merged commit 91d4660 into jonbinney:main Feb 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants