This project studies deterministic grid-based path planning with tabular reinforcement learning, and compares three classic methods:
- Monte Carlo Control
- SARSA
- Q-learning
Following the report setup, experiments are conducted on two maps:
- 4x4 map with 4 holes
- 10x10 map with 25 holes
The environment uses deterministic transitions and shaped rewards:
- step reward = -0.02
- wall collision reward = -0.1 (agent stays in place)
- trap reward = -1.0
- goal reward = 1.0
Report findings summarized:
- On 4x4, SARSA and Q-learning converge faster, while Monte Carlo learns more slowly due to delayed episode-end updates.
- On 10x10, SARSA is typically smoother under persistent exploration, Q-learning is stronger but can fluctuate more, and Monte Carlo requires a decaying-epsilon strategy to reach stable shortest-path performance.
- Under the final configuration, all methods can achieve high test success, while policy behaviors differ more clearly on the harder 10x10 map.
- Training success rate comparison:
- Training success rate comparison:
- Monte Carlo final policy:
- SARSA final policy:
- Q-learning final policy:
- Python 3.9+
- Dependencies in
requirements.txt
cd path/to/A0318887H
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/macOS: source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txtpython main.pyCommon examples:
python main.py --algo mc --grid 4
python main.py --algo all --grid all --out results --seed 1Arguments:
--algo:mc,sarsa,ql,all--grid:4,10,all--episodes_4,--episodes_10: training episodes--seed: random seed--out: output directory
Results are saved under results/{4x4,10x10}/{mc,sarsa,ql,compare}/.
- Algorithm folders include Q-tables, final policy image, training curves, test curves, and test summaries.
comparefolders include combined comparison figures across algorithms.
- Environment and map settings are defined in
environment.pyandconfig.py. - Default rewards and hyperparameters are configured in
config.py.




