The Movie Recommendation System is a project designed to suggest movies to users based on their preferences or viewing history. This system leverages data analysis and machine learning techniques to recommend movies using collaborative filtering and content-based filtering approaches.
- Personalized movie recommendations.
- Support for collaborative filtering based on user ratings.
- Content-based filtering using movie features such as genres, directors, or cast.
- Interactive user interface for exploring recommendations.
The system uses a dataset containing movie details, user ratings, and other relevant metadata. Link to the dataset:
- Movie Recommender System of which we have used the MovieLens 1M dataset
The dataset includes:
- Movie ID
- Title
- Genre
- User ID
- Rating
- Python 3.8 or higher
- Jupyter Notebook
- Required Python libraries:
- numpy
- pandas
- sklearn
- matplotlib
- seaborn
- collections
- tqdm
The recommendation system employs the following techniques:
Null values are replaced with means, and then SVD (Singular Value Decomposition) is used and matrix is reconstructed with first 20 singular values. Optimal number of clusters is taken as 3, and kmeans++ is applied on the user-rating train matrix. For every user from the test matrix, appropriate cluster is calculated by seeing minimum norm from each cluster, and cluster is assigned, and cluster means are assigned to each new user. Top 10 movies that new user has not seen, are recommended in a descending order of cluster mean rating.
RMSE for SVD + KMEANS = 0.0644 MAE = 0.0270
Collaborative filtering suggests movies based on user-item interactions. It uses:
-
User-based filtering: Recommends movies liked by similar users. This is achieved by calculating user similarity using metrics like cosine similarity or Pearson correlation. For example, if User A and User B have similar rating patterns, movies liked by User B are recommended to User A.
-
Item-based filtering: Recommends movies similar to those a user has rated highly. This approach computes similarity between items (movies) based on user ratings. If a user highly rated Movie X, other movies similar to X in terms of ratings are suggested.
Results where SVD was not applied on the user-rating matrix before applying the following algorithm:
- User-based Collaborative filtering
- RMSE = 0.1969
- MAE = 0.0436.
- Item-based Collaborative filtering
- RMSE = 0.1923
- MAE = 0.0418.
- User-based Collaborative filtering
- RMSE = 0.0671
- MAE = 0.0288.
- Item-based Collaborative filtering
- RMSE = 0.0592
- MAE = 0.0251.
Content-based filtering recommends movies by analyzing their features, such as titles and genres. The process involves:
- Data Preprocessing: Standardizing
Title
andGenres
by converting them to lowercase and removing spaces, then combining them into a "soup" feature. - Vectorization: Transforming the "soup" into a numerical representation using CountVectorizer.
- Similarity Calculation: Computing pairwise cosine similarity between movies to identify similar ones.
- Recommendation Generation: Recommending the top 10 movies most similar to those rated by the user, weighted by user ratings.
This ensures tailored recommendations based on movie content.
Handling a New Data Point
To demonstrate the recommender system's ability, a new user (UserID = 10000
) was created with manually assigned ratings for specific movies. These ratings reflected a preference for certain genres, namely sci-fi and action, and a dislike against comedy. The system then generated recommendations based on the user's ratings. The results were analyzed to verify that the recommended movies aligned with the user's genre preferences, highlighting the system's effectiveness in adapting to new data, as the predicted movies belonged to the genres preferred, and not the genre disliked.
Reinforcement learning (RL) is used to dynamically optimize movie recommendations by learning from user feedback. The system incorporates several bandit algorithms to maximize cumulative rewards:
- A/B Testing: Provides a baseline for comparison.
- Epsilon-Greedy: Balances exploration and exploitation.
- Thompson Sampling: Uses Bayesian inference for better exploration.
- UCB (Upper Confidence Bound): Focuses on high-confidence recommendations.
- Gradient Bandit: Adjusts preferences based on feedback.
- LinUCB: Leverages contextual information for personalized suggestions.
Rewards are assigned when user ratings exceed a set threshold (e.g., 3/5). Algorithms adapt over time to prioritize movies yielding higher rewards.
- Gradient Bandits, and Thompson Sampling excel in balancing exploration and exploitation.
- Visualizations show cumulative rewards, highlighting algorithm performance over time.
The accompanying graph illustrates the total reward at Visit 20,000 for different bandit algorithms:
- Gradient Bandits achieves the highest cumulative rewards, followed by Thompson Sampling, demonstrating their effectiveness.
- A/B test with 1K and LinUCB perform the worst on this dataset.
- The results for UCB, Epsilon greedy with 0.10, Epsilon greedy with 0.05, and A/B testing with 5K lie in between the best and the worst performances recorded, in this dataset.
This visualization emphasizes how algorithms such as Gradient Bandits and Thompson Sampling adaptively optimize recommendations, ensuring they align with evolving user preferences.
Reinforcement learning ensures adaptive, personalized recommendations as user preferences evolve.
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
This project is licensed under the MIT License. See the LICENSE file for details.
- GroupLens Research for MovieLens datasets.
- IMDb for movie metadata.
Feel free to reach out for questions or collaboration opportunities!