Skip to content

Repository containing the files for Project 2, course "HPC for numerical methods and data analysis", Fall 2024, EPFL

Notifications You must be signed in to change notification settings

MikyLanfra/Random_Nystrom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Randomized Nyström Low Rank Approximation

Description

The repository aims to implement the Randomized Nyström Low Rank Approximation method in python for a matrix $A \in \mathbb{R}^{n\times n}$, using Gaussian or SRHT sketching. It contains the files for Project 2, course MATH-505 "HPC for numerical methods and data analysis", Fall 2024, EPFL.
Project and Repository developed and curated by Tommaso Bozzi and Michele Lanfranconi.

Table of Contents

Installation and Usage

To install the project locally, please run the following commands:

git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
pip install -r requirements.txt

The functions reproducing the code can be found in the Utils folder, and the code can be run locally as follows:

mpiexec -n 4 python Utils/random_nystrom.py

Matrices

The matrices are created according to the instructions in the Project_Instructions.pdf file, generated as in matrix_generation.py. In order to run the file it is necessary to download the matrices mnist.mat and YearPredictionMSD.bz2 from the LIBSVM Data website as per Project Instructions, store them in a folder and add the path to the config file in matrix_input_path. It is also possible to choose the location where to generate the matrices used for the algorithm as matrix_output_path.

Results

This section provides a brief exploration of the main results, obtained by running the code on the Helvetios Cluster at EPFL.

Stability Analysis

A first set of results is given by the stability analysis of the method, obtained by analyzing the nuclear norm of the difference between $A$ and its low-rank approximation given by the algorithm. The following plots show the expected behaviour of an exponential decrease in error as we increase the low-rank dimension, while also showing the importance of the sketching in the algorithm, alongside the choice of sketching dimension. My Image My Image

Sequential Runtimes

The plot compares the average runtime of the algorithm on the 5 matrices as described by the project description, for both choices of sketching matrices with sketching dimension $l=50$. My Image

Parallel Runtimes

This section shows the scaling in runtime as we increase the number of processors, while also showing a slight difference associated with different sketching dimensions. My Image My Image

Runtimes Breakdown

This section describes in more detail how each part of the algorithm affects the overall runtime:

  • Hadamard Transform (only for SHRT)
  • Creation of $\Omega$ and Scatter of $A$
  • Creation of $B$ and $C$
  • TSQR algorithm
  • Other - SVD of B and R and sequential matrix multiplications My Image

Bibliography

About

Repository containing the files for Project 2, course "HPC for numerical methods and data analysis", Fall 2024, EPFL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published