A personalized movie recommendation system built on the MovieLens 100K dataset. MovieMate supports three recommendation strategies (collaborative filtering, content-based filtering, and rule-based), a diversifier for balancing relevance with genre variety, and a continuous learning module for detecting when the model needs retraining.
A full case analysis of the system's design decisions is available in CASE_ANALYSIS.md.
moviemate/
├── analysis/ # Jupyter notebooks and figures for the case analysis
│ ├── splits/ # train/val/test splits (generated by task1.ipynb)
│ ├── task1.ipynb # Data partitioning and deployment alignment
│ ├── task2.ipynb # Model selection and segment analysis
│ ├── task3.ipynb # Cold-start simulation
│ ├── task4.ipynb # Diversifier impact analysis
│ └── task5.ipynb # Drift detection and update scheduling
├── modules/
│ ├── adaptive/
│ │ └── continuous_learning.py
│ ├── filters/
│ │ ├── collaborative.py
│ │ ├── content_based.py
│ │ └── rule_based.py
│ └── personalization/
│ ├── diversifier.py
│ └── recommender.py
├── storage/ # MovieLens 100K data files
├── app.py
├── CASE_ANALYSIS.md # Case analysis
└── requirements.txt
Python 3.11 / 3.12 is recommended. Install dependencies with:
pip install -r requirements.txtapp.py shows an end-to-end example of loading a model, generating recommendations, and applying the diversifier:
python app.pyBy default it runs collaborative filtering with the tuned SVD parameters, generates top-10 recommendations for a sample user, and reranks the results with diversity. You can swap in any of the three models by changing the model instantiation at the top of the file.
The notebooks in analysis/ should be run in order since later tasks depend on the splits generated by task 1. Run them from inside the analysis/ directory:
cd analysis
jupyter notebooktask1.ipynb creates the temporal 60/20/20 train/val/test split and saves it to analysis/splits/. All other notebooks load from there.
task2.ipynb trains all three models on the training split and evaluates them on the test split using RMSE, Precision@10, and NDCG@10, with a segment breakdown by age, gender, and occupation.
task3.ipynb simulates the cold-start problem by progressively revealing user history and measuring how each model degrades. It also evaluates a hybrid routing strategy.
task4.ipynb sweeps the diversifier's alpha parameter from 0 to 1 and measures the tradeoff between ranking relevance (NDCG@10) and genre variety.
task5.ipynb simulates data drift using monthly evaluation windows and compares a KS test-triggered retraining policy against a fixed monthly schedule and no retraining.
The storage/ directory contains the MovieLens 100K dataset:
| File | Description |
|---|---|
u.data |
100,000 ratings (user, item, rating, timestamp) |
u.item |
Movie metadata including title and genre flags |
u.user |
User demographics (age, gender, occupation, zip) |
u.genre |
Genre list |
u.occupation |
Occupation list |
u.info |
Dataset summary counts |
Full dataset documentation is in storage/README.