ME5406 Project 1 - FrozenLake Reinforcement Learning

Project Overview

This project studies deterministic grid-based path planning with tabular reinforcement learning, and compares three classic methods:

Monte Carlo Control
SARSA
Q-learning

Following the report setup, experiments are conducted on two maps:

4x4 map with 4 holes
10x10 map with 25 holes

The environment uses deterministic transitions and shaped rewards:

step reward = -0.02
wall collision reward = -0.1 (agent stays in place)
trap reward = -1.0
goal reward = 1.0

Report findings summarized:

On 4x4, SARSA and Q-learning converge faster, while Monte Carlo learns more slowly due to delayed episode-end updates.
On 10x10, SARSA is typically smoother under persistent exploration, Q-learning is stronger but can fluctuate more, and Monte Carlo requires a decaying-epsilon strategy to reach stable shortest-path performance.
Under the final configuration, all methods can achieve high test success, while policy behaviors differ more clearly on the harder 10x10 map.

Key Output Figures

4x4 Comparison

Training success rate comparison:

10x10 Comparison

Training success rate comparison:

10x10 Final Policies

Monte Carlo final policy:

SARSA final policy:

Q-learning final policy:

Requirements

Python 3.9+
Dependencies in requirements.txt

Setup

cd path/to/A0318887H
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/macOS: source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Run

python main.py

Common examples:

python main.py --algo mc --grid 4
python main.py --algo all --grid all --out results --seed 1

Arguments:

--algo: mc, sarsa, ql, all
--grid: 4, 10, all
--episodes_4, --episodes_10: training episodes
--seed: random seed
--out: output directory

Output

Results are saved under results/{4x4,10x10}/{mc,sarsa,ql,compare}/.

Algorithm folders include Q-tables, final policy image, training curves, test curves, and test summaries.
compare folders include combined comparison figures across algorithms.

Notes

Environment and map settings are defined in environment.py and config.py.
Default rewards and hyperparameters are configured in config.py.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
results		results
.gitignore		.gitignore
A0318887H_ME5406_Project1.pdf		A0318887H_ME5406_Project1.pdf
LICENSE		LICENSE
Q_learning.py		Q_learning.py
README.md		README.md
config.py		config.py
environment.py		environment.py
main.py		main.py
mc_control.py		mc_control.py
plot.py		plot.py
requirements.txt		requirements.txt
sarsa.py		sarsa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ME5406 Project 1 - FrozenLake Reinforcement Learning

Project Overview

Key Output Figures

4x4 Comparison

10x10 Comparison

10x10 Final Policies

Requirements

Setup

Run

Output

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ME5406 Project 1 - FrozenLake Reinforcement Learning

Project Overview

Key Output Figures

4x4 Comparison

10x10 Comparison

10x10 Final Policies

Requirements

Setup

Run

Output

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages