A 'from scratch' implementation of a self-playing PPO and Gymnasium environment, compared against single-agent PPO (from scratch and SB3).
Read up on it in detail in my blog post.
This project showcases a complete RL pipeline from environment design to multi-agent training, featuring:
- Custom racing environment built from scratch using Gymnasium
- PPO implementation from scratch
- Self-play training where agents learn to race competitively against past versions of themselves
- Multi-agent dynamics with collision detection and competitive reward structures
racing_grid.mp4
Performance Metrics (Successful Runs Only):
- Success Rate: % of races completed
- Average Speed: Racing velocity in successful runs
- Average Distance: Total path length (lower = tighter racing line)
Efficiency Metrics (All Runs, Including Crashes):
- Steps/Progress: Time efficiency (lower = faster completion)
- Distance/Progress: Path efficiency (lower = optimal racing line)
Observation Space (19,):
- 11 raycasted distance sensors (180° front cone)
- Forward/lateral velocity, angular velocity, steering angle
- 4 relative features per opponent (position & velocity in local frame)
Action Space (2,):
- Steering: [-1, 1] (full left to full right)
- Throttle: [0, 1] (no acceleration to full throttle)
Key Features:
- Generalized Advantage Estimation (GAE)
- Learning rate annealing
- Log-std annealing for exploration → exploitation
- Gradient clipping and advantage normalization
- KL divergence early stopping
Algorithm:
- Train agent vs random opponent (updates 1-15)
- Create snapshot every 15 updates
- Maintain opponent pool (max 5 snapshots)
- Each rollout samples random opponent from pool
# Clone repository
git clone https://github.com/yourusername/racing-self-play
cd racing-self-play
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt├── agent/
│ ├── ppo.py # PPO implementation from scratch
│ └── self_play_ppo.py # Self-play wrapper with opponent pool
├── configs/
│ ├── base_config.py # Base hyperparameters
│ └── self_play_config.py # Self-play specific config
├── environment/
│ ├── car.py # Vehicle physics
│ ├── multi_car.py # Multi-agent car handling
│ ├── multi_racing_env.py # Multi-agent racing environment
│ ├── multi_track.py # Multi-agent track handling
│ ├── racing_env.py # Single-agent racing environment
│ ├── track.py # Procedural track generation & boundary logic
│ └── wrappers.py # Self-play opponent wrapper
├── utils/
│ ├── metrics.py # Evaluation metrics
│ └── visualization.py # Racing visualization & video generation
├── static/ # Generated visualizations & metrics
├── train.py # Training script
├── evaluate.py # Evaluation script
└── README.md
- Proximal Policy Optimization - Schulman et al. 2017
- Emergent Complexity via Multi-Agent Competition - Bansal et al. 2017
- Mastering Atari with Self-Play - OpenAI Five
- Stable Baselines3 Documentation
- Hugging Face DRL
MIT License

