Skip to content

xuehy/Minesweeper-PPO

Repository files navigation

MineSweeper Agent PPO

Features

  1. PPO implementation based on Ray

  2. Medium level: 16 x 16, 40 mines, Premium level version is on expert branch.

  3. ~62% winning rate (inference with max probability policy), with stochastic policy the rate is only ~53%.

Pretrained Checkpoint

  1. with clicking reward (every valid click earns a constant reward of +0.1)
  2. medium-checkpoint: policy_value_1250000.pt

Training Stats

tensorboard

Inference

  1. cd inference
  2. python inference.py

If you want to estimate the winning rate, run python stat_win_rate.py.

Requirements

  • einops==0.8.1
  • gym==0.26.2
  • gymnasium==1.0.0
  • numpy==2.1.2
  • ray==2.42.1
  • six==1.17.0
  • torch==2.6.0+cu118

Train your own model

  1. ensure python version == 3.10
  2. pip install -r requirements
  3. mkdir ckpt
  4. python main.py

Result

demo-medium

Acknowlegements

  1. The minesweeper env is borrowed from aylint/gym-minesweeper.
  2. The parallel PPO implementation is based on Ray framework.

About

RL agent trained by PPO for minesweeper

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages