The repository aims to implement the Randomized Nyström Low Rank Approximation method in python for a matrix
Project and Repository developed and curated by Tommaso Bozzi and Michele Lanfranconi.
To install the project locally, please run the following commands:
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
pip install -r requirements.txt
The functions reproducing the code can be found in the Utils
folder, and the code can be run locally as follows:
mpiexec -n 4 python Utils/random_nystrom.py
The matrices are created according to the instructions in the Project_Instructions.pdf file, generated as in matrix_generation.py.
In order to run the file it is necessary to download the matrices mnist.mat
and YearPredictionMSD.bz2
from the LIBSVM Data website as per Project Instructions, store them in a folder and add the path to the config file in matrix_input_path
. It is also possible to choose the location where to generate the matrices used for the algorithm as matrix_output_path
.
This section provides a brief exploration of the main results, obtained by running the code on the Helvetios Cluster
at EPFL.
A first set of results is given by the stability analysis of the method, obtained by analyzing the nuclear norm of the difference between
The plot compares the average runtime of the algorithm on the 5 matrices as described by the project description, for both choices of sketching matrices with sketching dimension
This section shows the scaling in runtime as we increase the number of processors, while also showing a slight difference associated with different sketching dimensions.
This section describes in more detail how each part of the algorithm affects the overall runtime:
- Hadamard Transform (only for SHRT)
- Creation of
$\Omega$ and Scatter of$A$ - Creation of
$B$ and$C$ - TSQR algorithm
- Other - SVD of B and R and sequential matrix multiplications
-
Tropp, J. A., Yurtsever, A., Udell, M., & Cevher, V. (2017). Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data. arXiv preprint, arXiv:1706.05736.
-
Balabanov, O., Beaupere, M., Grigori, L., & Lederer, V. (2023). Block Subsampled Randomized Hadamard Transform for Low-Rank Approximation on Distributed Architectures. ICML'23: Proceedings of the 40th International Conference on Machine Learning, 66, 1564–1576.
-
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. MNIST database available at http://yann.lecun.com/exdb/mnist/.
-
Bertin-Mahieux, T. (2011). Year Prediction MSD [Dataset]. UCI Machine Learning Repository. Accessed from UCI Machine Learning Repository.
-
Boutsidis, C., & Gittens, A. (2013). Improved Matrix Algorithms via the Subsampled Randomized Hadamard Transform. SIAM Journal on Matrix Analysis and Applications, 34(3), 1301–1340.