This group project aims to predict missing entries in a movie-user ratings matrix from the MovieLens dataset. The codebase is organized into modular components for data preprocessing, matrix completion algorithms, evaluation metrics, and utilities.
For context and overview of the work achieved one can refer to these resources :
- Instructions slides (instructions.pdf)
- Short report presenting our main ideas and results (report.pdf)
- Slides for the project presentation (slides.pdf)
- Python 3.8+
- Dependencies: run
pip install -r requirements.txtfrom your desired environment
- Place your MovieLens sparse users/movies rating matrix in the
data/directory (or any other directory) - Run the main script:
python generate.py --name "data/sparse_matrix"Evaluation metrics for matrix completion performance.
Functions:
rmse(pred_matrix, true_matrix): Root Mean Square Error calculationaccuracy_exact(pred_matrix, true_matrix): Exact accuracy (rounded predictions)
Data preprocessing utilities for matrix completion tasks.
DataPreprocessor class methods:
fusion(): Combine training and test matricessplit(): Split ratings into train/test setsnormalize_by_user(): User-centered normalizationdenormalize_by_user(): Restore original scalefilter_by_threshold(): Remove users/movies with few ratings
Matrix completion algorithms implementing different matric completion approaches :
-
AverageCompletion: Simple baseline method using row/column averages to fill missing values.
-
MatrixFactorisation: Advanced method using matrix factorization with two different algorithms (Alternating Least Squares and Gradient-based optimization)
-
IterativePCA: Iterative PCA-based imputation that alternates between estimating missing entries and computing a low-rank PCA reconstruction until convergence.
-
IterativeKernelPCA: Extension of the previous method, using kernel techniques to try to capture nonlinear behavior in the data.
Utilities for hyper-parameter selection and model validation. Provides basic cross validation and tuning tools such as K-folds and grid search.
