Skip to content

shendu-95/movie_recommender_system

Repository files navigation

Movie Recommender System

A content-based movie recommendation system that suggests movies similar to a user's selection. The engine uses Natural Language Processing (NLP) to analyze movie metadata (genres, cast, crew, and keywords) and serves recommendations via a Streamlit web application.

🚀 Overview

This project processes the TMDB 5000 Movie Dataset to create a recommendation algorithm. Instead of using user ratings, it focuses on the content of the movies themselves.

  • Data Processing: Cleans and merges datasets, extracts key features, and creates a unified "tag" system for every movie.
  • Machine Learning: Uses CountVectorizer to convert text tags into vectors and calculates Cosine Similarity to find the closest matches in a 5000-dimensional space.
  • Web App: A user-friendly interface built with Streamlit that displays movie recommendations and fetches real-time posters from the TMDB API.

🛠️ Technologies Used

  • Python 3.x
  • Pandas & NumPy: Data manipulation and analysis.
  • Scikit-learn: Used for CountVectorizer and cosine_similarity.
  • NLTK: Used PorterStemmer to reduce words to their root form (e.g., "dancing" → "danc").
  • Streamlit: Frontend framework for the web application.
  • TMDB API: Used to fetch movie posters dynamically.

📂 Project Structure

  • movie-recommender-system.ipynb: Jupyter Notebook containing the data preprocessing pipeline, vectorization, and model generation.
  • app.py: The main Streamlit application script.
  • tmdb_5000_movies.csv: Metadata dataset (budget, overview, popularity, etc.).
  • tmdb_5000_credits.csv: Credits dataset (cast, crew).
  • movie.pkl: (Generated) Pickled dataframe containing movie titles and tags.
  • similarity.pkl: (Generated) Pickled cosine similarity matrix.

⚙️ How It Works

1. Data Pipeline (movie-recommender-system.ipynb)

  1. Merging: The movies and credits datasets are merged on the movie title.
  2. Feature Extraction:
    • Genres & Keywords: Extracted from JSON format.
    • Cast: Top 3 actors are extracted.
    • Crew: The Director is isolated.
  3. Text Cleaning: Spaces are removed from names (e.g., "Sam Worthington" becomes "SamWorthington") to create unique vector tokens.
  4. Vectorization: A tags column is created by combining the overview, genres, keywords, cast, and crew. This text is stemmed and vectorized using a Bag-of-Words approach (5000 most frequent words).
  5. Model Export: The resulting dataframe and similarity matrix are exported as .pkl files for the app to use.

2. The Application (app.py)

The app loads the pre-trained models and provides a dropdown menu for movie selection. When the "Recommend" button is clicked, the system:

  1. Finds the index of the selected movie.
  2. Retrieves the 5 most similar movies based on the cosine similarity matrix.
  3. Fetches poster URLs using the TMDB API.
  4. Displays the titles and posters in a 5-column grid.

🔧 Setup & Installation

  1. Clone the repository:

    git clone [https://github.com/yourusername/movie-recommender-system.git](https://github.com/yourusername/movie-recommender-system.git)
  2. Install dependencies:

    pip install streamlit pandas numpy scikit-learn nltk requests
  3. Generate Models: Run the Jupyter Notebook to generate the necessary pickle files. Open movie-recommender-system.ipynb in Jupyter and run all cells. This will create movie.pkl and similarity.pkl.

  4. Run the App:

    streamlit run app.py

📝 API Configuration

The app.py file contains an authorization bearer token for the TMDB API to fetch posters.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors