Code written By Dan Vicente (danvi@kth.se) and Erik Lindé (elinde2@kth.se) during the spring of 2024, except for the code for parsing the CIFAR-10 dataset as a torch data object which was written by Lei Mao (https://github.com/leimao) in 2021 and modified by us in 2024.
This repository contains all the scripts used in our project in SF2568 at KTH. To get all the scripts and classes to your system, we recommend that you simply clone this repository into a suitable directory on your system. Full instructions for dependancies of this project follow.
EASGD is a parallel algorithm for training Neural Networks and acts as a parallel alternative to regular SGD or momentum methods like MSGD, ADAM etc. For the paper describing the idea, algorithm etc. we refer to [1].
We have evaluated the performance of both the EASGD and EAMGSD algorithms introduced in [1] using CPUs communicating using the
To download openMPI follow the link, download
After download has been made, you unzip/untar to a folder openmpi-2.0.x/ and run
> cd openmpi-2.0.x/
> ./configure --prefix=$HOME/your/path/here
> make all
> make install
> $HOME/your/path/here/mpirun --version
mpirun (Open MPI) 2.0.x
Report bugs to http://www.open-mpi.org/community/help/
To install Libtorch, follow the link and download the appropriate build for your system configurations. https://pytorch.org/
NOTE: we are using Mac (with M1) and thus the default C++/Java build for /usr/local/libtorch, but you may specify the exact path in /src/CMakeLists.txt.
The datasets we have used are ../dataset/mnist and ../dataset/cifar-10-batches-bin from the following links,
We use CMake to build/compile the scripts. At the moment of writing this documentation, ../src/CMakeLists.txt, run
> cd build
> cmake ../src
> cmake --build . --config Release
> cd ..
If everything is setup correctly, you should have executables for running the MSGD (sequential) script, EASGD and EAMSGD (parallel) scripts for both the
EXAMPLE: Training the CNN on the
> cd src
> mpiexec -n 9 ./eamsgd_cifar 4 0.125 0.9
EXAMPLE: Training the CNN on the
> cd src
> mpiexec -n 5 ./eamsgd_mnist 2 0.25 0.9
EXAMPLE: Training the CNN using MSGD on
> cd src
> ./msgd_cifar 0.9
There are some predefined experiment setups contained in bash-scripts (e.g ../src/experiments.sh, ../src/experiment_easgd.sh etc.)
EXAMPLE: Running EAMSGD and MSGD training for many different values of the hyperparameters. (Generating most of the figures in the report)
> cd src
> sh ./experiments.sh
WARNING: This bash-job can take several days to finish depending on the grid size.
[1] Sixing Zhang, Anna Choromanska, Yann LeCun, Deep learning with Elastic Averaging SGD, https://arxiv.org/abs/1412.6651