Multi-Agent Racing with Self-Play Reinforcement Learning

A 'from scratch' implementation of a self-playing PPO and Gymnasium environment, compared against single-agent PPO (from scratch and SB3).

Read up on it in detail in my blog post.

Project Overview

This project showcases a complete RL pipeline from environment design to multi-agent training, featuring:

Custom racing environment built from scratch using Gymnasium
PPO implementation from scratch
Self-play training where agents learn to race competitively against past versions of themselves
Multi-agent dynamics with collision detection and competitive reward structures

racing_grid.mp4

Results

Performance Metrics (Successful Runs Only):

Success Rate: % of races completed
Average Speed: Racing velocity in successful runs
Average Distance: Total path length (lower = tighter racing line)

Efficiency Metrics (All Runs, Including Crashes):

Steps/Progress: Time efficiency (lower = faster completion)
Distance/Progress: Path efficiency (lower = optimal racing line)

Architecture

Environment Design

Observation Space (19,):

11 raycasted distance sensors (180° front cone)
Forward/lateral velocity, angular velocity, steering angle
4 relative features per opponent (position & velocity in local frame)

Action Space (2,):

Steering: [-1, 1] (full left to full right)
Throttle: [0, 1] (no acceleration to full throttle)

PPO Implementation

Key Features:

Generalized Advantage Estimation (GAE)
Learning rate annealing
Log-std annealing for exploration → exploitation
Gradient clipping and advantage normalization
KL divergence early stopping

Self-Play Training

Algorithm:

Train agent vs random opponent (updates 1-15)
Create snapshot every 15 updates
Maintain opponent pool (max 5 snapshots)
Each rollout samples random opponent from pool

Installation

# Clone repository
git clone https://github.com/yourusername/racing-self-play
cd racing-self-play

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Project Structure

├── agent/
│   ├── ppo.py              # PPO implementation from scratch
│   └── self_play_ppo.py    # Self-play wrapper with opponent pool
├── configs/
│   ├── base_config.py      # Base hyperparameters
│   └── self_play_config.py # Self-play specific config
├── environment/
│   ├── car.py              # Vehicle physics
│   ├── multi_car.py        # Multi-agent car handling
│   ├── multi_racing_env.py # Multi-agent racing environment
│   ├── multi_track.py      # Multi-agent track handling
│   ├── racing_env.py       # Single-agent racing environment
│   ├── track.py            # Procedural track generation & boundary logic
│   └── wrappers.py         # Self-play opponent wrapper
├── utils/
│   ├── metrics.py          # Evaluation metrics
│   └── visualization.py    # Racing visualization & video generation
├── static/                 # Generated visualizations & metrics
├── train.py                # Training script
├── evaluate.py             # Evaluation script
└── README.md

References

Proximal Policy Optimization - Schulman et al. 2017
Emergent Complexity via Multi-Agent Competition - Bansal et al. 2017
Mastering Atari with Self-Play - OpenAI Five
Stable Baselines3 Documentation
Hugging Face DRL

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Racing with Self-Play Reinforcement Learning

Project Overview

Results

Architecture

Environment Design

PPO Implementation

Self-Play Training

Installation

Project Structure

References

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
agent		agent
configs		configs
environment		environment
static		static
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py

LucasHJin/self-play-racing

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Racing with Self-Play Reinforcement Learning

Project Overview

Results

Architecture

Environment Design

PPO Implementation

Self-Play Training

Installation

Project Structure

References

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages