RL Project: Atari Tennis Tournament

Comparison of Reinforcement Learning algorithms on Atari Tennis (ALE/Tennis-v5 via Gymnasium/PettingZoo).

Overview

This project implements and compares five RL agents playing Atari Tennis against the built-in AI and in head-to-head tournaments.

Algorithms

Agent	Type	Policy	Update Rule
Random	Baseline	Uniform random	None
SARSA	TD(0), on-policy	ε-greedy	$W_a \leftarrow W_a + \alpha \cdot (r + \gamma \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)$
Q-Learning	TD(0), off-policy	ε-greedy	$W_a \leftarrow W_a + \alpha \cdot (r + \gamma \max_{a'} \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)$
Monte Carlo	First-visit MC	ε-greedy	$W_a \leftarrow W_a + \alpha \cdot (G_t - \hat{q}(s, a)) \cdot \phi(s)$
DQN	Deep Q-Network	ε-greedy	MLP (256→256) with experience replay & target network

Architecture

Linear agents (SARSA, Q-Learning, Monte Carlo): $\hat{q}(s, a; \mathbf{W}) = \mathbf{W}_a^\top \phi(s)$ with $\phi(s) \in \mathbb{R}^{128}$ (RAM observation)
DQN: MLP network (128 → 128 → 64 → 18) trained with Adam optimizer, Huber loss, and periodic target network sync

Environment

Game: Atari Tennis via PettingZoo (tennis_v3)
Observation: RAM state (128 features)
Action Space: 18 discrete actions
Agents: 2 players (first_0 and second_0)

Project Structure

.
├── Project_RL_DANJOU_VON-SIEMENS.ipynb   # Main notebook
├── README.md                              # This file
├── checkpoints/                           # Saved agent weights
│   ├── sarsa.pkl
│   ├── q_learning.pkl
│   ├── montecarlo.pkl
│   └── dqn.pkl
└── plots/                                 # Training & evaluation plots
    ├── SARSA_training_curves.png
    ├── Q-Learning_training_curves.png
    ├── MonteCarlo_training_curves.png
    ├── DQN_training_curves.png
    ├── evaluation_results.png
    └── championship_matrix.png

Key Results

Win Rate vs Random Baseline

Agent	Win Rate
SARSA	88.9%
Q-Learning	41.2%
Monte Carlo	47.1%
DQN	6.2%

Championship Tournament

Full round-robin tournament where each agent faces every other agent in both positions (first_0/second_0).

Notebook Sections

Configuration & Checkpoints — Incremental training workflow with pickle serialization
Utility Functions — Observation normalization, ε-greedy policy
Agent Definitions — RandomAgent, SarsaAgent, QLearningAgent, MonteCarloAgent, DQNAgent
Training Infrastructure — train_agent(), plot_training_curves()
Evaluation — Match system, random baseline, round-robin tournament
Results & Visualization — Win rate plots, matchup matrix heatmap

Known Issues

Monte Carlo & DQN: Checkpoint loading issues — saved weights may not restore properly during evaluation (training works correctly)

Dependencies

Python 3.13+
numpy, matplotlib
torch
gymnasium, ale-py
pettingzoo
tqdm

Authors

Arthur DANJOU
Moritz VON SIEMENS

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
checkpoints		checkpoints
plots		plots
.gitignore		.gitignore
.python-version		.python-version
Project_RL_DANJOU_VON-SIEMENS.ipynb		Project_RL_DANJOU_VON-SIEMENS.ipynb
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Project: Atari Tennis Tournament

Overview

Algorithms

Architecture

Environment

Project Structure

Key Results

Win Rate vs Random Baseline

Championship Tournament

Notebook Sections

Known Issues

Dependencies

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL Project: Atari Tennis Tournament

Overview

Algorithms

Architecture

Environment

Project Structure

Key Results

Win Rate vs Random Baseline

Championship Tournament

Notebook Sections

Known Issues

Dependencies

Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages