In French: Machine Learning Avancé (MLA)
The goal of this project is to reproduce the results of a recent research paper in the field of Deep Learning (DL). Our group has chosen the paper "Net2Net: Accelerating Learning via Knowledge Transfer" by Chen et al. (2016) which introduces Net2Net techniques, aiming at transferring quickly the knowledge from a previously trained network to a wider or deeper one.
To create and activate a virtual conda environment, run the following commands:
conda create -n mla python=3.10
conda activate mla
Then, install a version of PyTorch with CUDA support (if you have an NVIDIA GPU), compatible with your version of NVIDIA GPU driver. Have a look at the PyTorch website to find the right command to run. For instance, if you have an NVIDIA GPU with CUDA 11.8 support, run the following command:
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Install the other dependencies by running the following command in the root directory of the project:
pip3 install -r requirements.txt
To execute the Jupyter notebooks, you will need to install the ipykernel package in the conda environment and to create a kernel for this environment:
pip3 install ipykernel
python -m ipykernel install --user --name=mla
Try to run the demo notebooks src/inceptionv2_cifar/demo_net2wider.ipynb
and src/inceptionv2_cifar/demo_net2deeper.ipynb
to check that everything is working fine.
To exit the virtual environment, and remove it if needed, run the following commands:
conda deactivate
conda remove -n mla --all
All the scripts in this repository should be run from the root directory of the project (for paths to be resolved correctly). For instance, to run the script src/inceptionv2_cifar/main.py
, run the following command:
python3 src/inceptionv2_cifar/main.py
When it comes to developing a neural network to perform a specific task, it is common to start with a relatively simple network architecture and then make it more complex to achieve better performance. In such a workflow, each new architecture is generally trained from scratch, and does not take advantage of what has been learned from previous architectures. This is costly in both time and money. To address this problem, the authors of this paper propose a strategy for transferring information learned by a neural network to a larger neural network. In this way, the latter takes less time to train.
Beyond this application, the authors also raise the idea of using their approach to develop lifelong learning systems. Indeed, it is common to seek to increase the capabilities of an existing neural network by training it on a larger database, and it is essential in this case to increase the complexity of the model to capture the larger distribution of data. Again, rather than training a new, more complex architecture from scratch, it is better to take advantage of the knowledge acquired by the previous model.
The main idea of the paper is "function-preserving initializations". In other words, after training a teacher network, we aim to create a more complex student network, whose weight initialization is such that it produces the same outputs as the teacher network. So, by updating the weights as we move in the direction of the gradient, we are guaranteed to have a student network at least as good as the teacher network.
The authors propose two techniques to increase the complexity of a neural network. The first, called Net2WiderNet, makes it possible to increase the number of neurons in a layer (or equivalently with CNNs, the number of filters per convolution), without modifying the network predictions (this method can then be applied at several layers of the network). The second, called Net2DeeperNet, allows you to increase the number of layers of the network, without modifying the network predictions. These two techniques can be combined to obtain a student network that is wider and deeper than the teacher network.
- Update the "Quick Start" section of the README.md file to help the user install PyTorch with CUDA support for a different version of NVIDIA GPU driver.
- Gather the two Net2Net techniques in a single class, and create a package for it.
- Apply Net2WiderNet to Inception-V2 (need to adapt the code to take batch normalization and concatenation into account).
- Apply Net2DeeperNet to Inception-V2.
- Establish the pipeline to reproduce the results of the paper and set up the experiments on the university's GPU cluster.
- Update the README files in the
src
directory to describe the code. - Download the ImageNet dataset and train Inception-V2 from scratch on it.
- Add dropout to the Inception-V2 architecture (or some random noise to the replicated weights) to help the student network to learn to use its full capacity.
- Implement the "Random pad" baseline method to compare the Net2WiderNet technique with it.
- Introduce a multiplicative factor to modulate the number of output filters of each branch of the Inception module, and check that setting this factor to
$\sqrt{0.3}$ (as in the paper) leads to a reduction of 60% of the number of parameters. - Split the code of the Net2WiderNet technique into several functions to make it more readable.
- Create a package for models (LeNet, Inception-V2)
- Add a method to widen multiple layers of a network
- Update the parameters files for Inception-V2/ImageNet and LeNet/MNIST to match the one of Inception-V2/CIFAR-10 (and the training code).