-
PPO implementation based on Ray
-
Medium level: 16 x 16, 40 mines, Premium level version is on
expertbranch. -
~62% winning rate (inference with max probability policy), with stochastic policy the rate is only ~53%.
- with clicking reward (every valid click earns a constant reward of +0.1)
- medium-checkpoint: policy_value_1250000.pt
cd inferencepython inference.py
If you want to estimate the winning rate, run python stat_win_rate.py.
- einops==0.8.1
- gym==0.26.2
- gymnasium==1.0.0
- numpy==2.1.2
- ray==2.42.1
- six==1.17.0
- torch==2.6.0+cu118
- ensure
python version == 3.10 pip install -r requirementsmkdir ckptpython main.py
- The minesweeper env is borrowed from aylint/gym-minesweeper.
- The parallel PPO implementation is based on Ray framework.

