This project implements a Deep Reinforcement Learning (DRL) agent that learns to play a shooting game on an ESP32 microcontroller. The agent uses a Deep Q-Network (DQN) to learn optimal strategies for moving and shooting in a 2D game environment.
The project consists of three main components:
-
Python Training Environment (
Train_IN_Py.py):- Implements the game environment using Pygame
- Contains the DQN agent implementation
- Handles training and weight generation
- Saves trained weights in a format compatible with ESP32
-
Game Environment (
RL_env.py):- Implements the actual game mechanics
- Handles rendering and game state
- Provides TCP server for communication with ESP32
- Manages game objects (player, bullets, enemies)
-
ESP32 Implementation (
DRLagentINESP32.ino):- Runs the trained DQN on the ESP32
- Communicates with the game environment via TCP
- Implements the neural network inference
- Handles weight updates and learning
- Adaptive Learning: The agent uses dynamic learning rates and exploration rates based on performance
- Experience Replay: Implements prioritized experience replay for better learning
- Double DQN: Uses target network for more stable learning
- Enhanced Rewards: Sophisticated reward structure for better learning
- Real-time Training: Can train and update weights while playing
- Cross-Platform: Works between Python and ESP32
The DQN consists of:
- Input Layer: 5 neurons (player_x, bullet_x, bullet_y, enemy_x, enemy_y)
- Hidden Layer 1: 16 neurons with ReLU activation
- Hidden Layer 2: 16 neurons with ReLU activation
- Output Layer: 3 neurons (LEFT, RIGHT, SHOOT)
- Learning Rate: 0.001 (adaptive)
- Discount Factor: 0.95
- Epsilon (exploration): 1.0 to 0.01
- Memory Size: 500
- Batch Size: 32
- Target Update Frequency: 50
The agent receives rewards for:
- Hitting enemies (+30.0)
- Being aligned with enemies (+3.0)
- Moving towards enemies (+0.7)
- Staying near center (+0.5)
And penalties for:
- Missing shots (-2.0)
- Being near edges (-6.0)
- Being stuck (-4.0)
- Being too far from enemies (-1.0)
-
Python Environment Setup:
pip install pygame numpy torch
-
ESP32 Setup:
- Install Arduino IDE
- Install ESP32 board support
- Install required libraries (WiFi.h, EEPROM.h)
-
Training:
python Train_IN_Py.py
- The training will start in headless mode
- Rendering will be enabled when success rate exceeds 0.25
- Weights will be saved to
live_weights.hperiodically
-
Running on ESP32:
- Upload the ESP32 code
- Connect to the same WiFi network as the training environment
- The ESP32 will automatically connect to the game environment
- TCP Port: 5050
- State Format: "STATE:player_x,bullet_x,bullet_y,enemy_x,enemy_y;REWARD:value"
- Action Format: "LEFT\n", "RIGHT\n", or "SHOOT\n"
The system tracks:
- Success rate (hits in last 20 actions)
- Learning rate (adaptive)
- Epsilon (exploration rate)
- Training steps
- Episode rewards
Feel free to contribute to this project by:
- Forking the repository
- Creating a feature branch
- Making your changes
- Submitting a pull request