Rust selfplay integration with play v2#347
Conversation
- Move LatestModel.write() after ONNX export in trainer.py to prevent Rust self-play from reading latest.yaml before model files exist - Add wait-for-ONNX-file logic in Rust continuous mode (both initial model wait and model reload check) - Add NetworkType enum and network config parsing to selfplay_config.rs - Add ci-resnet.yaml for testing with ResNet network type - Add MLP input support to Rust backlog End-to-end test verified: Rust self-play with ResNet config successfully loads ONNX models, plays games, produces npz replay buffers, hot-reloads new models, and shuts down gracefully via sentinel file.
|
|
||
| f.rename(new_name) | ||
| with open(new_name, "rb") as f: | ||
| data = pickle.load(f) |
There was a problem hiding this comment.
@alejandromarcu do you know why data was being read here? I don't see it being used, and Claude removed it when changing to NPZ format. Just want to confirm that's OK.
There was a problem hiding this comment.
I think that initially I was not saving the game length, so I had to open the data and get the length of the game, and then when I switched to storing it, I didn't realize. Good catch, Claude!
|
Ok, this is ready for review. Can you please take a look and let me know if you agree with merging this? THanks @alejandromarcu and @jonbinney ! |
alejandromarcu
left a comment
There was a problem hiding this comment.
Please look at the comments about the arguments, I think that could be improved. Other than that it looks good. And we should have the same line length configuration for Python so we're not changing the lines back and forth, we need to figure out what happened
|
|
||
| f.rename(new_name) | ||
| with open(new_name, "rb") as f: | ||
| data = pickle.load(f) |
There was a problem hiding this comment.
I think that initially I was not saving the game length, so I had to open the data and get the length of the game, and then when I switched to storing it, I didn't realize. Good catch, Claude!
|
@alejandromarcu comment addressed, can you PTAL again? |
|
how about now @alejandromarcu ? |
Summary
This PR connects the Rust self-play binary with the Python v2 training pipeline, enabling the trainer to use fast Rust-based self-play instead of Python workers. The integration is activated with a single
--rust-selfplayflag.Changes
1. Switch Python to npz format (
5e68672)alphazero.py: Changedend_game_batch_and_save_replay_buffersfrom pickle to npz with Rust-compatible field names (input_arrays,policies,action_masks,values,players)trainer.py: Updated replay reading from.pklto.npz, with conversion to per-sample dicts at read time2. Add
--model-versionCLI parameter (6d060bb)selfplay.rs: Added--model-versionCLI arg to stamp replay buffer metadata with the correct model version3. Add continuous mode with model hot-reload (
903f8d7)selfplay.rs: Added--continuous,--latest-model-yaml,--shutdown-fileCLI argsselfplay_config.rs: AddedLatestModelYamlstruct andload_latest_model()functionlatest.yamlbetween games, reloads ONNX model on version change, exits on shutdown sentinel file4. Add
--rust-selfplayflag to Python trainer (182003f)train_v2.py: Added--rust-selfplayarg that spawns Rust self-play processes in continuous modeconfig.py: Addedrust_selfplay_binaryfield toSelfPlayConfig5. Fix race conditions + ResNet CI config (
8e6c94e)trainer.py: MovedLatestModel.write()after ONNX export to prevent consumers from reading yaml before model files existselfplay.rs: Added wait-for-ONNX-file logic in continuous modeselfplay_config.rs: AddedNetworkTypeenum and config parsing foralphazero.network.typeci-resnet.yamlfor testing with ResNet network typeUsage
Testing
ci-resnet.yaml):Deferred
deep_quoridor/rust/backlog.md)