MineSweeper Agent PPO

Features

PPO implementation based on Ray
Medium level: 16 x 16, 40 mines, Premium level version is on expert branch.
~62% winning rate (inference with max probability policy), with stochastic policy the rate is only ~53%.

Pretrained Checkpoint

with clicking reward (every valid click earns a constant reward of +0.1)
medium-checkpoint: policy_value_1250000.pt

Training Stats

Inference

cd inference
python inference.py

If you want to estimate the winning rate, run python stat_win_rate.py.

Requirements

einops==0.8.1
gym==0.26.2
gymnasium==1.0.0
numpy==2.1.2
ray==2.42.1
six==1.17.0
torch==2.6.0+cu118

Train your own model

ensure python version == 3.10
pip install -r requirements
mkdir ckpt
python main.py

Result

Acknowlegements

The minesweeper env is borrowed from aylint/gym-minesweeper.
The parallel PPO implementation is based on Ray framework.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
inference		inference
resources		resources
README.md		README.md
env.py		env.py
game_env.py		game_env.py
learner.py		learner.py
main.py		main.py
model.py		model.py
monitor.py		monitor.py
predictor.py		predictor.py
publisher.py		publisher.py
requirements.txt		requirements.txt
sampler.py		sampler.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MineSweeper Agent PPO

Features

Pretrained Checkpoint

Training Stats

Inference

Requirements

Train your own model

Result

Acknowlegements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

xuehy/Minesweeper-PPO

Folders and files

Latest commit

History

Repository files navigation

MineSweeper Agent PPO

Features

Pretrained Checkpoint

Training Stats

Inference

Requirements

Train your own model

Result

Acknowlegements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages