This is a Github repository created to submit the fourth Homework of the Algorithmic Methods for Data Mining (ADM) course for the MSc. in Data Science at the Sapienza University of Rome.
-
README.md: A markdown file that explains the content of the repository. -
main.ipynb: A Jupyter Notebook file containing all the relevant exercises and reports belonging to the homework questions, the Command Line Question, and the Algorithmic Question. -
modules/: A folder including 4 Python modules used to solve the exercises inmain.ipynb. The files included are:-
__init__.py: A init file that allows us to import the modules into our Jupyter Notebook. -
data_handler.py: A Python file including aDataHandlerclass designed to handle data cleaning and feature engineering on Kaggle's Netflix Clicks Dataset. -
recommender.py: A Python file including aRecommenderclass designed to build a Recommendation Engine with LSH using user data obtained from Kaggle's Netflix Clicks Dataset. -
cluster.py: A Python file including three classes:FAMD,KMeans, andKMeans++designed to perform Factor Analysis of Mixed Data on Kaggle's Netflix Clicks Dataset and then perform parallelized k-Means and k-Means++ clustering using PySpark. -
plotter.py: A Python file including aPlotterclass designed to build auxiliary plots for the written report onmain.ipynb.
-
-
commandline.sh: A bash script including the code to solve the Command Line Question. -
images/: A folder containing a screenshot of the successful execution of thecommandline.shscript. -
.gitignore: A predetermined.gitignorefile that tells Git which files or folders to ignore in a Python project. -
LICENSE: A file containing an MIT permissive license.
In this homework we worked with Kaggle's predefined Netflix Clicks Dataset.
If the Notebook doesn't load through Github please try all of these steps:
-
Try compiling the Notebook through its NBViewer.
-
Try downloading the Notebook and opening it in your local computer.
Author: Miguel Angel Sanchez Cortes
Email: [email protected]
MSc. in Data Science, Sapienza University of Rome