Skip to content

Rust selfplay integration with play v2#347

Merged
adamantivm merged 13 commits intojonbinney:mainfrom
adamantivm:copilot-worktree-2026-02-25T20-21-57
Mar 2, 2026
Merged

Rust selfplay integration with play v2#347
adamantivm merged 13 commits intojonbinney:mainfrom
adamantivm:copilot-worktree-2026-02-25T20-21-57

Conversation

@adamantivm
Copy link
Collaborator

Summary

This PR connects the Rust self-play binary with the Python v2 training pipeline, enabling the trainer to use fast Rust-based self-play instead of Python workers. The integration is activated with a single --rust-selfplay flag.

Changes

1. Switch Python to npz format (5e68672)

  • alphazero.py: Changed end_game_batch_and_save_replay_buffers from pickle to npz with Rust-compatible field names (input_arrays, policies, action_masks, values, players)
  • trainer.py: Updated replay reading from .pkl to .npz, with conversion to per-sample dicts at read time

2. Add --model-version CLI parameter (6d060bb)

  • selfplay.rs: Added --model-version CLI arg to stamp replay buffer metadata with the correct model version

3. Add continuous mode with model hot-reload (903f8d7)

  • selfplay.rs: Added --continuous, --latest-model-yaml, --shutdown-file CLI args
  • selfplay_config.rs: Added LatestModelYaml struct and load_latest_model() function
  • Rust binary polls latest.yaml between games, reloads ONNX model on version change, exits on shutdown sentinel file

4. Add --rust-selfplay flag to Python trainer (182003f)

  • train_v2.py: Added --rust-selfplay arg that spawns Rust self-play processes in continuous mode
  • config.py: Added rust_selfplay_binary field to SelfPlayConfig
  • Auto-enables ONNX export when Rust self-play is active

5. Fix race conditions + ResNet CI config (8e6c94e)

  • trainer.py: Moved LatestModel.write() after ONNX export to prevent consumers from reading yaml before model files exist
  • selfplay.rs: Added wait-for-ONNX-file logic in continuous mode
  • selfplay_config.rs: Added NetworkType enum and config parsing for alphazero.network.type
  • Added ci-resnet.yaml for testing with ResNet network type
  • Added MLP input support to Rust backlog

Usage

# Build the Rust binary
cd deep_quoridor/rust && cargo build --release --features binary --bin selfplay

# Run training with Rust self-play
cd deep_quoridor/src && python train_v2.py ../experiments/ci-resnet.yaml --rust-selfplay

Testing

  • All 118 Rust unit tests pass
  • End-to-end test verified with ResNet config (ci-resnet.yaml):
    • Rust workers load initial ONNX model, play games, produce npz replay buffers
    • Python trainer consumes buffers, trains models, exports ONNX
    • Rust workers detect model hot-reloads (v0→v1→v2)
    • Graceful shutdown via sentinel file works correctly

Deferred

  • MLP network input support in Rust (tracked in deep_quoridor/rust/backlog.md)

Julian Cerruti added 9 commits February 26, 2026 20:13
- Move LatestModel.write() after ONNX export in trainer.py to prevent
  Rust self-play from reading latest.yaml before model files exist
- Add wait-for-ONNX-file logic in Rust continuous mode (both initial
  model wait and model reload check)
- Add NetworkType enum and network config parsing to selfplay_config.rs
- Add ci-resnet.yaml for testing with ResNet network type
- Add MLP input support to Rust backlog

End-to-end test verified: Rust self-play with ResNet config successfully
loads ONNX models, plays games, produces npz replay buffers, hot-reloads
new models, and shuts down gracefully via sentinel file.

f.rename(new_name)
with open(new_name, "rb") as f:
data = pickle.load(f)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alejandromarcu do you know why data was being read here? I don't see it being used, and Claude removed it when changing to NPZ format. Just want to confirm that's OK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that initially I was not saving the game length, so I had to open the data and get the length of the game, and then when I switched to storing it, I didn't realize. Good catch, Claude!

@adamantivm adamantivm changed the title DRAFT: Rust selfplay integration with play v2 Rust selfplay integration with play v2 Mar 2, 2026
@adamantivm
Copy link
Collaborator Author

Ok, this is ready for review.
I ran train_v2 using the ci configurations (there is now one that uses resnet too) and it seems to run well, both with python selfplay and rust selfplay.
I reviewed all the changes and they seem fine to me. There are mroe changes than necessary due to formatting, but in terms of new / changed code it's not that much actually.
Note one thing I'm doing is having the agent write relatively modular commits, so you can review one commit at the time to make it more tractable, I recommend it.

Can you please take a look and let me know if you agree with merging this?
Also, if you have ideas or requests of specific integration tests to try to make sure things work fine, that would be appreciated.

THanks @alejandromarcu and @jonbinney !

Copy link
Collaborator

@alejandromarcu alejandromarcu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the comments about the arguments, I think that could be improved. Other than that it looks good. And we should have the same line length configuration for Python so we're not changing the lines back and forth, we need to figure out what happened


f.rename(new_name)
with open(new_name, "rb") as f:
data = pickle.load(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that initially I was not saving the game length, so I had to open the data and get the length of the game, and then when I switched to storing it, I didn't realize. Good catch, Claude!

@adamantivm
Copy link
Collaborator Author

@alejandromarcu comment addressed, can you PTAL again?

@adamantivm
Copy link
Collaborator Author

how about now @alejandromarcu ?

@adamantivm adamantivm merged commit 9a1ede0 into jonbinney:main Mar 2, 2026
3 checks passed
@adamantivm adamantivm deleted the copilot-worktree-2026-02-25T20-21-57 branch March 2, 2026 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants