First systematic empirical study of Steepest Perturbed Gradient Descent (SPGD) [Vahedi & Ilies, 2024] on machine-learning loss landscapes.
This repository contains the full code, configurations, figures, and result artefacts for the WPI MA 551 (Computational Statistics) final project, "Does Steepest Selection Help? An Empirical Study of Saddle-Point-Escaping Optimizers on Non-Convex Machine-Learning Loss Landscapes" by Shyam Patadia (Spring 2026).
SPGD is a candidate-selection rule on top of gradient descent: every IterP steps, sample NP uniform candidates around the current iterate and commit the one that yields the steepest loss decrease. The original paper proposed it but never tested it on neural networks and never compared against a same-compute random-selection control. This work does both.
Headline result. On CIFAR-10/ResNet-18 with three paired seeds, SPGD beats its random-perturbation control (RPGD) on every seed with a paired-mean test-accuracy gap of +3.78 percentage points. On a non-convex matrix-completion benchmark with documented saddle structure (MovieLens-100K, Burer–Monteiro), SPGD's selection rule fires at 30% acceptance vs 9% for the random control — direct evidence that the mechanism is empirically active on a real ML problem.
It's not the noise. It's the choice.
.
├── src/spgd_study/ # Importable package
│ ├── optimizers/ # SPGD, RPGD, PGD as torch.optim.Optimizer subclasses
│ ├── benchmarks.py # Rastrigin / Ackley / Rosenbrock
│ ├── data.py # OpenML / CIFAR-10 / MovieLens loaders
│ ├── diagnostics.py # Stagnation episodes + escape-time logger
│ ├── models.py # MLP / ResNet-18 (BN-disabled in early layers)
│ ├── runner.py # Synthetic / Two Moons trainer
│ ├── nn_runner.py # Mini-batch trainer (OpenML, CIFAR, etc.)
│ └── utils.py # Seeding, paired-protocol helpers
├── experiments/ # One script per experiment + plotting + configs
│ ├── configs/ # YAML hyperparameters per experiment
│ ├── exp1_benchmarks.py # Synthetic non-convex benchmarks
│ ├── exp2_two_moons.py # Two Moons MLP (visualisation)
│ ├── exp3_openml.py # OpenML-CC18 tabular MLP
│ ├── exp4_cifar10.py # ResNet-18 anchor experiment
│ ├── exp5_ablation.py # N_P × IterP grid
│ ├── exp6_matrix_completion.py
│ ├── compute_exp6_escape_time.py
│ └── plot_exp{1..6}.py # All plotting scripts (PNG out)
├── tests/ # Smoke tests for optimizers, OpenML loader
├── turing/ # WPI Turing cluster Slurm scripts
├── figures/ # All PNG figures used in the report
├── results/ # CSVs and JSON summaries used by the report
├── report/ # LaTeX sources for report + presentation
├── pyproject.toml # uv-managed project
└── uv.lock # Pinned dependency graph
This project is managed end-to-end with uv.
Install uv first (one-time):
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Then install the project's pinned environment:
uv sync --python 3.11This creates .venv/ and installs PyTorch (CUDA 12.1 wheels), torchvision,
NumPy, scikit-learn, OpenML, pandas, matplotlib, etc., exactly as
specified in uv.lock.
Verify the install:
uv run python -c "import torch; print('cuda:', torch.cuda.is_available())"All scripts are run via uv run so they pick up the project's locked
environment automatically. Each experiment writes results into
results/ and figures into figures/.
uv run python experiments/exp1_benchmarks.py --config experiments/configs/exp1.yaml
uv run python experiments/plot_exp1.pyuv run python experiments/exp2_two_moons.py --config experiments/configs/exp2.yaml
uv run python experiments/plot_exp2.pyuv run python experiments/exp3_openml.py --config experiments/configs/exp3.yaml
uv run python experiments/plot_exp3.pyLocal single-seed (any CUDA GPU):
uv run python experiments/exp4_cifar10.py --config experiments/configs/exp4.yaml --seed 0 --optimizer spgdOn the WPI Turing cluster (Slurm array, all 5 optimizers × 3 seeds):
sbatch turing/slurm_exp4.sbatchPlot:
uv run python experiments/plot_exp4.pyuv run python experiments/exp5_ablation.py --config experiments/configs/exp5.yaml
uv run python experiments/plot_exp5.pyuv run python experiments/exp6_matrix_completion.py --config experiments/configs/exp6.yaml
uv run python experiments/compute_exp6_escape_time.py
uv run python experiments/plot_exp6.py| Name | Description |
|---|---|
| SGD | Vanilla stochastic gradient descent. First-order baseline. |
| Adam | Per-parameter adaptive scaling. Modern ML default. |
| PGD | Gradient descent with one Gaussian perturbation when ‖∇f‖ is small (Jin et al., 2017). |
| RPGD | (novel control) Same multi-candidate perturbation as SPGD, but commits a random candidate. |
| SPGD | Steepest of NP uniform candidates committed; paper's ≤ acceptance rule. |
The SPGD-vs-RPGD contrast is the cleanest test of SPGD's central
claim: both algorithms perturb the same way and pay the same compute
cost, so any difference is attributable to the selection rule.
Implementations live in src/spgd_study/optimizers/.
| Local | Windows 11, RTX 4050 — used for Experiments 1, 2, 3, 5, 6 |
| Cluster | WPI Turing, NVIDIA A30 GPU — used for Experiment 4 (CIFAR-10/ResNet-18) |
| Package manager | uv |
| Cluster transfer | scp |
LaTeX sources for both the written report and the presentation live in
report/:
report.tex— main report (arXiv-style)presentation.tex— Beamer presentationreferences.bib— bibliography
Build with:
cd report
pdflatex report && bibtex report && pdflatex report && pdflatex report
pdflatex presentation && pdflatex presentationIf you reference this work:
@misc{patadia2026spgd,
author = {Shyam Patadia},
title = {Does Steepest Selection Help? An Empirical Study of
Saddle-Point-Escaping Optimizers on Non-Convex Machine-Learning
Loss Landscapes},
year = {2026},
howpublished = {WPI MA 551 Computational Statistics, Final Project},
url = {https://github.com/shyampatadia/spgd_research}
}The original SPGD paper:
@article{vahedi2024spgd,
title = {SPGD: Steepest Perturbed Gradient Descent Optimization},
author = {Vahedi, Amir M. and Ilies, Horea T.},
journal = {arXiv preprint arXiv:2411.04946},
year = {2024}
}The author thanks the WPI Academic Computing team for access to the Turing cluster, on which the CIFAR-10 anchor experiment was carried out.
Code in this repository is released under the MIT License (see
LICENSE if present, otherwise add one before distributing).
The third-party SPGD paper sources under spgd_paper/ are excluded from
the repository for copyright reasons.