Skip to content

py533/ME5406_1

Repository files navigation

ME5406 Project 1 - FrozenLake Reinforcement Learning

Project Overview

This project studies deterministic grid-based path planning with tabular reinforcement learning, and compares three classic methods:

  • Monte Carlo Control
  • SARSA
  • Q-learning

Following the report setup, experiments are conducted on two maps:

  • 4x4 map with 4 holes
  • 10x10 map with 25 holes

The environment uses deterministic transitions and shaped rewards:

  • step reward = -0.02
  • wall collision reward = -0.1 (agent stays in place)
  • trap reward = -1.0
  • goal reward = 1.0

Report findings summarized:

  • On 4x4, SARSA and Q-learning converge faster, while Monte Carlo learns more slowly due to delayed episode-end updates.
  • On 10x10, SARSA is typically smoother under persistent exploration, Q-learning is stronger but can fluctuate more, and Monte Carlo requires a decaying-epsilon strategy to reach stable shortest-path performance.
  • Under the final configuration, all methods can achieve high test success, while policy behaviors differ more clearly on the harder 10x10 map.

Key Output Figures

4x4 Comparison

  • Training success rate comparison:

4x4 Training Success Rate Comparison

10x10 Comparison

  • Training success rate comparison:

10x10 Training Success Rate Comparison

10x10 Final Policies

  • Monte Carlo final policy:

10x10 Monte Carlo Final Policy

  • SARSA final policy:

10x10 SARSA Final Policy

  • Q-learning final policy:

10x10 Q-learning Final Policy

Requirements

  • Python 3.9+
  • Dependencies in requirements.txt

Setup

cd path/to/A0318887H
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/macOS: source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Run

python main.py

Common examples:

python main.py --algo mc --grid 4
python main.py --algo all --grid all --out results --seed 1

Arguments:

  • --algo: mc, sarsa, ql, all
  • --grid: 4, 10, all
  • --episodes_4, --episodes_10: training episodes
  • --seed: random seed
  • --out: output directory

Output

Results are saved under results/{4x4,10x10}/{mc,sarsa,ql,compare}/.

  • Algorithm folders include Q-tables, final policy image, training curves, test curves, and test summaries.
  • compare folders include combined comparison figures across algorithms.

Notes

  • Environment and map settings are defined in environment.py and config.py.
  • Default rewards and hyperparameters are configured in config.py.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages