Comparison of Reinforcement Learning algorithms on Atari Tennis (ALE/Tennis-v5 via Gymnasium/PettingZoo).
This project implements and compares five RL agents playing Atari Tennis against the built-in AI and in head-to-head tournaments.
| Agent | Type | Policy | Update Rule |
|---|---|---|---|
| Random | Baseline | Uniform random | None |
| SARSA | TD(0), on-policy | ε-greedy | |
| Q-Learning | TD(0), off-policy | ε-greedy | |
| Monte Carlo | First-visit MC | ε-greedy | |
| DQN | Deep Q-Network | ε-greedy | MLP (256→256) with experience replay & target network |
-
Linear agents (SARSA, Q-Learning, Monte Carlo):
$\hat{q}(s, a; \mathbf{W}) = \mathbf{W}_a^\top \phi(s)$ with$\phi(s) \in \mathbb{R}^{128}$ (RAM observation) - DQN: MLP network (128 → 128 → 64 → 18) trained with Adam optimizer, Huber loss, and periodic target network sync
- Game: Atari Tennis via PettingZoo (
tennis_v3) - Observation: RAM state (128 features)
- Action Space: 18 discrete actions
- Agents: 2 players (
first_0andsecond_0)
.
├── Project_RL_DANJOU_VON-SIEMENS.ipynb # Main notebook
├── README.md # This file
├── checkpoints/ # Saved agent weights
│ ├── sarsa.pkl
│ ├── q_learning.pkl
│ ├── montecarlo.pkl
│ └── dqn.pkl
└── plots/ # Training & evaluation plots
├── SARSA_training_curves.png
├── Q-Learning_training_curves.png
├── MonteCarlo_training_curves.png
├── DQN_training_curves.png
├── evaluation_results.png
└── championship_matrix.png
| Agent | Win Rate |
|---|---|
| SARSA | 88.9% |
| Q-Learning | 41.2% |
| Monte Carlo | 47.1% |
| DQN | 6.2% |
Full round-robin tournament where each agent faces every other agent in both positions (first_0/second_0).
- Configuration & Checkpoints — Incremental training workflow with pickle serialization
- Utility Functions — Observation normalization, ε-greedy policy
- Agent Definitions —
RandomAgent,SarsaAgent,QLearningAgent,MonteCarloAgent,DQNAgent - Training Infrastructure —
train_agent(),plot_training_curves() - Evaluation — Match system, random baseline, round-robin tournament
- Results & Visualization — Win rate plots, matchup matrix heatmap
- Monte Carlo & DQN: Checkpoint loading issues — saved weights may not restore properly during evaluation (training works correctly)
- Python 3.13+
numpy,matplotlibtorchgymnasium,ale-pypettingzootqdm
- Arthur DANJOU
- Moritz VON SIEMENS