Skip to content

Latest commit

 

History

History
166 lines (129 loc) · 6.86 KB

File metadata and controls

166 lines (129 loc) · 6.86 KB

Training, Evaluation & Experiment Commands

Common Commands

# MCTS demos
cargo run -p colver-core --bin mcts_demo --release -- 100
cargo run -p colver-core --bin smart_ismcts_demo --release -- 100

# DD solver benchmark
cargo run -p colver-core --bin dd_bench --release -- 1000

# Python bindings
uv sync
uv run python3 -c "import colver; env = colver.Env(); env.reset()"

# Web frontend
uv run python -m colver.web

# Docker
docker build -t colver . && docker run -p 8000:8000 colver

Belief Network Training

# Generate game replay data (COLVGM01 format, preferred)
cargo run -p colver-core --bin generate_game_data --release --features parallel -- \
  --dmc-model models/dmc_final.bin --games 500000 --output data/games.bin

# V2 training (304-dim, standard architecture)
cargo run -p colver-core --bin train_belief_net --features dmc_train --release -- \
  --replays data/training/games_500k.bin --epochs 200 --batch-size 512 --lr 3e-4 \
  --v2 --augment --cosine-lr --warmup-epochs 10 --val-split 0.05 \
  --output models/belief_net_v2.bin

# V3 temporal features (380-dim)
cargo run -p colver-core --bin train_belief_net --features dmc_train --release -- \
  --replays data/training/games_500k.bin --epochs 15 --batch-size 512 --lr 3e-4 \
  --v3 --augment --cosine-lr --warmup-epochs 3 --output models/race_v3.bin

# V3 3-class output (observer excluded, 3 classes: left/partner/right)
# count-reg default is now 0.1; 300 epochs for overnight training
cargo run -p colver-core --bin train_belief_net --features dmc_train --release -- \
  --replays data/training/games_500k.bin --epochs 300 --batch-size 512 --lr 3e-4 \
  --cosine-lr --warmup-epochs 10 --seed 42 --v2 --augment \
  --count-reg 0.1 --output models/belief_v3.bin

# Architecture variants: --variant cross_attn | suit_shared | var_mlp | aux_loss
# Variable MLP: --variant var_mlp --num-layers 3 --hidden 256
# Card count regularization: --count-reg 0.1
# suit_shared doesn't need --augment (equivariant by construction)

Belief Evaluation (7 modes)

cargo run --bin belief_eval --release -- --model models/belief_net_v2.bin \
  --replays data/training/games_500k.bin --mode MODE --games 5000

Modes: offline (accuracy/CE/calibration), match (IS-DD NN vs heuristic), diagnose (per-card predictions), scenario (hand-crafted tests), per_trick (per-trick accuracy), ablation (input block importance), ensemble (multi-model averaging via --model m1.bin,m2.bin).

DMC Card Play Training

# Rust training (candle, ~474 steps/s on 4090)
cargo run -p colver-core --bin train_dmc --features dmc_train --release -- \
  --num-envs 256 --steps 35000000 \
  --bid-model models/bid_nn_final.bin \
  --nn-bid-start 0.75 --nn-bid-end 0.95 --nn-bid-anneal-steps 20000000 \
  --eval-freq 1000000 \
  --eval-random-matches 100 \
  --eval-isdd-matches 10 --eval-isdd-time-ms 20 \
  --eval-checkpoint models/dmc_35.bin --eval-checkpoint-matches 50

# Python training (slower, ~140 steps/s)
PYTHONPATH=scripts/training uv run python scripts/training/train_dmc.py --num-envs 256 --steps 20000000

# Evaluation
uv run python scripts/analysis/eval_dmc.py models/dmc_final.pt \
  --games 200 --baseline smart --time-ms 20 --both-sides

# Export PyTorch weights to Rust binary format
python scripts/export/export_dmc_weights.py models/dmc_final.pt models/dmc_final.bin

NN Bidding Training

# Bid a Dede (v2, default): 3×512, DD solver + 24× suit augmentation
cargo run -p colver-core --bin train_bid_nn --features dmc_train --release -- \
  --hidden 512 --layers 3 --steps 20000000 --pool-file data/pools/dd_2.5M.bin

# Bid a Doudou (v1, legacy): 2×256, DouZero self-play
cargo run -p colver-core --bin train_bid_nn --features dmc_train --release -- \
  --num-envs 64 --steps 5000000 --pool-size 1000000

Phase 1: pre-solves deal pool (1M deals x 4 suits). Phase 2: trains Dueling DQN with PER + opponent diversity. BidNet::load auto-detects hidden size (tries 256, 512, 1024).

NN Value Function Training (feature nn, parked)

# Generate training data
cargo run -p colver-core --bin generate_value_data --release --features nn -- \
  10000 data/training/value_train.bin --fast

# Train (PyTorch)
python scripts/training/train_value_net.py --data data/training/value_train.bin --output models/value_net.bin

# Evaluate
cargo run -p colver-core --bin nn_experiment --release --features nn -- \
  models/value_net.bin 50 --data data/training/value_train.bin

PyPI Publishing

Published as colver via GitHub Actions with trusted publishing.

git tag v0.2.1 && git push origin v0.2.1
# Builds wheels for: x86_64-linux, aarch64-linux, x86_64-macos, aarch64-macos, x86_64-windows

Experiment Binaries

All binaries: cargo run -p colver-core --bin NAME --release -- ARGS

Binary Feature Description
bench Performance benchmark (~1.3M rollouts/sec)
mcts_demo rand MCTS vs random demo
smart_ismcts_demo rand Smart IS-MCTS vs random + vs naive
oracle_experiment rand Bid achievability with perfect-info MCTS
bidding_experiment rand Head-to-head bidding strategies
match_experiment rand Full match play (first to 2000 pts)
bid_tournament rand Round-robin parameterized bidding
bid_debug rand Side-by-side bidding printout
bid_compare rand Bidding comparison with DD oracle
bid_nn_eval rand Evaluate bid NN vs heuristic bidders
bid_nn_tournament rand NN bid round-robin across play methods
strength_experiment rand Rollout policy comparison, D×I sweep
maxi_diagnose rand Maxi vs DMC play-by-play diagnostic
v2_tournament rand V2 bidding fine-tune tournament
dd_bench DD solver benchmark
dd_calibrate rand DD bidding calibration
isdd_sweep rand IS-DD parameter sweep (count/time/soft)
generate_belief_data rand Belief training data (COLVBL01 format)
generate_game_data rand Game replays (COLVGM01 format)
generate_value_data nn NN value function training data
train_belief_net dmc_train Belief network training
train_bid_nn dmc_train NN bidding training
train_dmc dmc_train DMC card play training
belief_eval rand Belief network evaluation (7 modes)
nn_experiment nn NN value function evaluation

IS-DD Sweep Results

Recommended web configs based on sweep (200 deals, vs DouDou35 DMC play model):

  • 20ms time-limited + soft inference: ~48% vs DouDou35, ~230ms/deal
  • 50ms time-limited: ~57%, 515ms/deal (higher quality)
  • Gains plateau sharply after D=8 determinizations
  • Soft inference worth it at D≥16 (+3.5% for 7% more compute)

Note: The default play model is now DouDou50 (411-dim canonical ResNet, 50M steps). DouDou35 (415-dim legacy, 35M steps) remains available for backward compatibility.