Skip to content

radinabakalov/moviemate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MovieMate

A personalized movie recommendation system built on the MovieLens 100K dataset. MovieMate supports three recommendation strategies (collaborative filtering, content-based filtering, and rule-based), a diversifier for balancing relevance with genre variety, and a continuous learning module for detecting when the model needs retraining.

A full case analysis of the system's design decisions is available in CASE_ANALYSIS.md.


Project Structure

moviemate/
├── analysis/                      # Jupyter notebooks and figures for the case analysis
│   ├── splits/                    # train/val/test splits (generated by task1.ipynb)
│   ├── task1.ipynb                # Data partitioning and deployment alignment
│   ├── task2.ipynb                # Model selection and segment analysis
│   ├── task3.ipynb                # Cold-start simulation
│   ├── task4.ipynb                # Diversifier impact analysis
│   └── task5.ipynb                # Drift detection and update scheduling
├── modules/
│   ├── adaptive/
│   │   └── continuous_learning.py   
│   ├── filters/
│   │   ├── collaborative.py         
│   │   ├── content_based.py         
│   │   └── rule_based.py            
│   └── personalization/
│       ├── diversifier.py           
│       └── recommender.py           
├── storage/                       # MovieLens 100K data files
├── app.py 
├── CASE_ANALYSIS.md               # Case analysis 
└── requirements.txt

Setup

Python 3.11 / 3.12 is recommended. Install dependencies with:

pip install -r requirements.txt

Running the App

app.py shows an end-to-end example of loading a model, generating recommendations, and applying the diversifier:

python app.py

By default it runs collaborative filtering with the tuned SVD parameters, generates top-10 recommendations for a sample user, and reranks the results with diversity. You can swap in any of the three models by changing the model instantiation at the top of the file.


Running the Analysis Notebooks

The notebooks in analysis/ should be run in order since later tasks depend on the splits generated by task 1. Run them from inside the analysis/ directory:

cd analysis
jupyter notebook

task1.ipynb creates the temporal 60/20/20 train/val/test split and saves it to analysis/splits/. All other notebooks load from there.

task2.ipynb trains all three models on the training split and evaluates them on the test split using RMSE, Precision@10, and NDCG@10, with a segment breakdown by age, gender, and occupation.

task3.ipynb simulates the cold-start problem by progressively revealing user history and measuring how each model degrades. It also evaluates a hybrid routing strategy.

task4.ipynb sweeps the diversifier's alpha parameter from 0 to 1 and measures the tradeoff between ranking relevance (NDCG@10) and genre variety.

task5.ipynb simulates data drift using monthly evaluation windows and compares a KS test-triggered retraining policy against a fixed monthly schedule and no retraining.


Data

The storage/ directory contains the MovieLens 100K dataset:

File Description
u.data 100,000 ratings (user, item, rating, timestamp)
u.item Movie metadata including title and genre flags
u.user User demographics (age, gender, occupation, zip)
u.genre Genre list
u.occupation Occupation list
u.info Dataset summary counts

Full dataset documentation is in storage/README.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors