Minesweeper Reinforcement Learning

Train an AI agent to play Minesweeper using Deep Q-Learning (DQN). This project demonstrates reinforcement learning techniques applied to the classic Minesweeper game, with support for both CPU and GPU training.

🎯 Features

Deep Q-Network (DQN) with target network and experience replay
Double DQN for more stable training
Parallel environment training for faster data collection
Support for both CPU and GPU with automatic optimization
Real-time visualization with Pygame GUI
Training progress plots with win rate, rewards, and speed metrics
Mixed precision training (GPU only) for faster computation
Configurable hyperparameters via command line

📋 Requirements

Python 3.8+
PyTorch
Gymnasium
Pygame
Matplotlib
NumPy

🚀 Installation

Clone the repository:

git clone https://github.com/keypaa/minesweeper-rl.git
cd minesweeper-rl

Install dependencies:

pip install -r requirements.txt

🎮 Usage

Basic Training (Auto-detect GPU/CPU)

python train.py

This will automatically use CUDA if available, otherwise fallback to CPU.

Specify Device

Train on GPU:

python train.py --device cuda

Train on CPU:

python train.py --device cpu

Custom Number of Episodes

python train.py --episodes 2000

All Options

python train.py --device cuda --episodes 5000

⚙️ Configuration

The training automatically adjusts settings based on the device:

GPU Configuration (CUDA)

Parallel Environments: 128
Batch Size: 2048
Updates per Step: 4
Replay Buffer: 100,000
Mixed Precision: Enabled
Model Compilation: Enabled (PyTorch 2.0+)

CPU Configuration

Parallel Environments: 8
Batch Size: 128
Updates per Step: 1
Replay Buffer: 50,000
Mixed Precision: Disabled
Model Compilation: Disabled

📊 Training Outputs

Logs

Training logs are saved to training_log_YYYYMMDD_HHMMSS.txt with detailed episode information.

Plots

Training progress plots are saved to the plots/ directory, showing:

Win rate over episodes (with 50-episode moving average)
Episode rewards (with moving average)
Win rate vs training time
Training speed (episodes per minute)

Visualization

Every 500 episodes, the agent plays a game with the GUI to visualize its current strategy.

🏗️ Project Structure

.
├── train.py              # Main training script
├── agent.py              # DQN agent implementation
├── minesweeper_env.py    # Gymnasium environment for Minesweeper
├── gui.py                # Pygame visualization
├── requirements.txt      # Python dependencies
└── README.md             # This file

🧠 How It Works

Environment

The Minesweeper environment is implemented as a Gymnasium-compatible environment:

State: 9x9 grid with cell values (-3: flagged, -2: hidden, -1: mine, 0-8: adjacent mine count)
Actions: 162 actions (81 reveal + 81 flag/unflag)
Rewards:
- +10 for winning
- -5 for hitting a mine
- +1.0 for revealing a 0-cell
- +0.2 for revealing numbered cells
- +0.1 bonus per cell revealed by cascading
- -1 for invalid moves

Agent Architecture

Convolutional layers: Extract spatial features from the board
Fully connected layers: Map features to Q-values for each action
Target network: Stabilizes training by providing consistent targets
Experience replay: Breaks correlation between consecutive samples

Training Algorithm

Collect experiences from parallel environments
Store in replay buffer (50K-100K capacity)
Sample random batches for training (128-2048 samples)
Update policy network using Double DQN
Periodically sync target network every 500 updates
Epsilon-greedy exploration (decays from 0.9 to 0.05)

📈 Expected Performance

With GPU training (1000 episodes):

Training Time: ~11-15 minutes
Win Rate: 20-25%
Training Speed: ~1.5 episodes/second

With CPU training (1000 episodes):

Training Time: ~1-2 hours (varies by CPU)
Win Rate: Similar final performance
Training Speed: ~0.2-0.5 episodes/second

🔧 Customization

You can modify hyperparameters in train.py:

Learning rate, batch size, buffer size
Network architecture in agent.py
Reward structure in minesweeper_env.py
Epsilon decay schedule

🤝 Contributing

Contributions are welcome! Feel free to:

Report bugs
Suggest features
Submit pull requests

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with PyTorch and Gymnasium
Inspired by DeepMind's DQN paper
Minesweeper game mechanics based on the classic Windows game

📧 Contact

For questions or suggestions, please open an issue on GitHub.

Happy Training! 🎮🤖

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
V1.txt		V1.txt
agent.py		agent.py
eval.py		eval.py
gui.py		gui.py
minesweeper_env.py		minesweeper_env.py
monitor_gpu.sh		monitor_gpu.sh
play.py		play.py
requirements.txt		requirements.txt
results.csv		results.csv
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minesweeper Reinforcement Learning

🎯 Features

📋 Requirements

🚀 Installation

🎮 Usage

Basic Training (Auto-detect GPU/CPU)

Specify Device

Custom Number of Episodes

All Options

⚙️ Configuration

GPU Configuration (CUDA)

CPU Configuration

📊 Training Outputs

Logs

Plots

Visualization

🏗️ Project Structure

🧠 How It Works

Environment

Agent Architecture

Training Algorithm

📈 Expected Performance

🔧 Customization

🤝 Contributing

📝 License

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Minesweeper Reinforcement Learning

🎯 Features

📋 Requirements

🚀 Installation

🎮 Usage

Basic Training (Auto-detect GPU/CPU)

Specify Device

Custom Number of Episodes

All Options

⚙️ Configuration

GPU Configuration (CUDA)

CPU Configuration

📊 Training Outputs

Logs

Plots

Visualization

🏗️ Project Structure

🧠 How It Works

Environment

Agent Architecture

Training Algorithm

📈 Expected Performance

🔧 Customization

🤝 Contributing

📝 License

🙏 Acknowledgments

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages