Rust selfplay integration with play v2 by adamantivm · Pull Request #347 · jonbinney/deep_rabbit_hole

adamantivm · 2026-03-02T14:18:56Z

Summary

This PR connects the Rust self-play binary with the Python v2 training pipeline, enabling the trainer to use fast Rust-based self-play instead of Python workers. The integration is activated with a single --rust-selfplay flag.

Changes

1. Switch Python to npz format (`5e68672`)

alphazero.py: Changed end_game_batch_and_save_replay_buffers from pickle to npz with Rust-compatible field names (input_arrays, policies, action_masks, values, players)
trainer.py: Updated replay reading from .pkl to .npz, with conversion to per-sample dicts at read time

2. Add `--model-version` CLI parameter (`6d060bb`)

selfplay.rs: Added --model-version CLI arg to stamp replay buffer metadata with the correct model version

3. Add continuous mode with model hot-reload (`903f8d7`)

selfplay.rs: Added --continuous, --latest-model-yaml, --shutdown-file CLI args
selfplay_config.rs: Added LatestModelYaml struct and load_latest_model() function
Rust binary polls latest.yaml between games, reloads ONNX model on version change, exits on shutdown sentinel file

4. Add `--rust-selfplay` flag to Python trainer (`182003f`)

train_v2.py: Added --rust-selfplay arg that spawns Rust self-play processes in continuous mode
config.py: Added rust_selfplay_binary field to SelfPlayConfig
Auto-enables ONNX export when Rust self-play is active

5. Fix race conditions + ResNet CI config (`8e6c94e`)

trainer.py: Moved LatestModel.write() after ONNX export to prevent consumers from reading yaml before model files exist
selfplay.rs: Added wait-for-ONNX-file logic in continuous mode
selfplay_config.rs: Added NetworkType enum and config parsing for alphazero.network.type
Added ci-resnet.yaml for testing with ResNet network type
Added MLP input support to Rust backlog

Usage

# Build the Rust binary
cd deep_quoridor/rust && cargo build --release --features binary --bin selfplay

# Run training with Rust self-play
cd deep_quoridor/src && python train_v2.py ../experiments/ci-resnet.yaml --rust-selfplay

Testing

All 118 Rust unit tests pass
End-to-end test verified with ResNet config (ci-resnet.yaml):
- Rust workers load initial ONNX model, play games, produce npz replay buffers
- Python trainer consumes buffers, trains models, exports ONNX
- Rust workers detect model hot-reloads (v0→v1→v2)
- Graceful shutdown via sentinel file works correctly

Deferred

MLP network input support in Rust (tracked in deep_quoridor/rust/backlog.md)

- Move LatestModel.write() after ONNX export in trainer.py to prevent Rust self-play from reading latest.yaml before model files exist - Add wait-for-ONNX-file logic in Rust continuous mode (both initial model wait and model reload check) - Add NetworkType enum and network config parsing to selfplay_config.rs - Add ci-resnet.yaml for testing with ResNet network type - Add MLP input support to Rust backlog End-to-end test verified: Rust self-play with ResNet config successfully loads ONNX models, plays games, produces npz replay buffers, hot-reloads new models, and shuts down gracefully via sentinel file.

adamantivm · 2026-03-02T14:20:12Z

deep_quoridor/src/v2/trainer.py


            f.rename(new_name)
-            with open(new_name, "rb") as f:
-                data = pickle.load(f)


@alejandromarcu do you know why data was being read here? I don't see it being used, and Claude removed it when changing to NPZ format. Just want to confirm that's OK.

I think that initially I was not saving the game length, so I had to open the data and get the length of the game, and then when I switched to storing it, I didn't realize. Good catch, Claude!

adamantivm · 2026-03-02T14:48:06Z

Ok, this is ready for review.
I ran train_v2 using the ci configurations (there is now one that uses resnet too) and it seems to run well, both with python selfplay and rust selfplay.
I reviewed all the changes and they seem fine to me. There are mroe changes than necessary due to formatting, but in terms of new / changed code it's not that much actually.
Note one thing I'm doing is having the agent write relatively modular commits, so you can review one commit at the time to make it more tractable, I recommend it.

Can you please take a look and let me know if you agree with merging this?
Also, if you have ideas or requests of specific integration tests to try to make sure things work fine, that would be appreciated.

THanks @alejandromarcu and @jonbinney !

alejandromarcu

Please look at the comments about the arguments, I think that could be improved. Other than that it looks good. And we should have the same line length configuration for Python so we're not changing the lines back and forth, we need to figure out what happened

deep_quoridor/src/v2/trainer.py

alejandromarcu · 2026-03-02T18:41:41Z

deep_quoridor/src/v2/trainer.py


            f.rename(new_name)
-            with open(new_name, "rb") as f:
-                data = pickle.load(f)


I think that initially I was not saving the game length, so I had to open the data and get the length of the game, and then when I switched to storing it, I didn't realize. Good catch, Claude!

deep_quoridor/src/train_v2.py

adamantivm · 2026-03-02T19:45:39Z

@alejandromarcu comment addressed, can you PTAL again?

deep_quoridor/src/v2/config.py

deep_quoridor/src/train_v2.py

adamantivm · 2026-03-02T20:25:44Z

how about now @alejandromarcu ?

Julian Cerruti added 9 commits February 26, 2026 20:13

manual: Use terminal for commands

f0cd25d

vibe: Switch Python self-play and trainer to use npz instead of pickle

5951093

vibe: Add --model-version CLI parameter to Rust self-play binary

1b13936

vibe: Add continuous mode with model hot-reload to Rust self-play

f6825ad

vibe: Add --rust-selfplay flag to Python training entry point

5184aa6

vibe: Add PR summary for Rust self-play integration

47e5035

Update training arguments in launch configuration for reduced epochs

08d5440

manual: save feature plan

c270c18

adamantivm commented Mar 2, 2026

View reviewed changes

Julian Cerruti added 2 commits March 2, 2026 11:23

manual: rule to organize agent commits

752395a

remove unnecessary updates

a3e31ae

adamantivm changed the title ~~DRAFT: Rust selfplay integration with play v2~~ Rust selfplay integration with play v2 Mar 2, 2026

adamantivm requested review from alejandromarcu and jonbinney March 2, 2026 14:45

alejandromarcu approved these changes Mar 2, 2026

View reviewed changes

vibe: rework --rust-selfplay as --selfplay-program

5acb85c

alejandromarcu requested changes Mar 2, 2026

View reviewed changes

deep_quoridor/src/v2/config.py Outdated Show resolved Hide resolved

deep_quoridor/src/train_v2.py Outdated Show resolved Hide resolved

deep_quoridor/src/train_v2.py Outdated Show resolved Hide resolved

vibe: address reviewer comments on selfplay config args

dec07a5

alejandromarcu approved these changes Mar 2, 2026

View reviewed changes

adamantivm merged commit 9a1ede0 into jonbinney:main Mar 2, 2026
3 checks passed

adamantivm deleted the copilot-worktree-2026-02-25T20-21-57 branch March 2, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rust selfplay integration with play v2#347

Rust selfplay integration with play v2#347
adamantivm merged 13 commits intojonbinney:mainfrom
adamantivm:copilot-worktree-2026-02-25T20-21-57

adamantivm commented Mar 2, 2026

Uh oh!

adamantivm Mar 2, 2026

Uh oh!

alejandromarcu Mar 2, 2026

Uh oh!

adamantivm commented Mar 2, 2026

Uh oh!

alejandromarcu left a comment

Uh oh!

Uh oh!

alejandromarcu Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

adamantivm commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamantivm commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adamantivm commented Mar 2, 2026

Summary

Changes

1. Switch Python to npz format (5e68672)

2. Add --model-version CLI parameter (6d060bb)

3. Add continuous mode with model hot-reload (903f8d7)

4. Add --rust-selfplay flag to Python trainer (182003f)

5. Fix race conditions + ResNet CI config (8e6c94e)

Usage

Testing

Deferred

Uh oh!

adamantivm Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

alejandromarcu Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

adamantivm commented Mar 2, 2026

Uh oh!

alejandromarcu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alejandromarcu Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adamantivm commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamantivm commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Switch Python to npz format (`5e68672`)

2. Add `--model-version` CLI parameter (`6d060bb`)

3. Add continuous mode with model hot-reload (`903f8d7`)

4. Add `--rust-selfplay` flag to Python trainer (`182003f`)

5. Fix race conditions + ResNet CI config (`8e6c94e`)