Movie Recommender System

A content-based movie recommendation system that suggests movies similar to a user's selection. The engine uses Natural Language Processing (NLP) to analyze movie metadata (genres, cast, crew, and keywords) and serves recommendations via a Streamlit web application.

🚀 Overview

This project processes the TMDB 5000 Movie Dataset to create a recommendation algorithm. Instead of using user ratings, it focuses on the content of the movies themselves.

Data Processing: Cleans and merges datasets, extracts key features, and creates a unified "tag" system for every movie.
Machine Learning: Uses CountVectorizer to convert text tags into vectors and calculates Cosine Similarity to find the closest matches in a 5000-dimensional space.
Web App: A user-friendly interface built with Streamlit that displays movie recommendations and fetches real-time posters from the TMDB API.

🛠️ Technologies Used

Python 3.x
Pandas & NumPy: Data manipulation and analysis.
Scikit-learn: Used for CountVectorizer and cosine_similarity.
NLTK: Used PorterStemmer to reduce words to their root form (e.g., "dancing" → "danc").
Streamlit: Frontend framework for the web application.
TMDB API: Used to fetch movie posters dynamically.

📂 Project Structure

movie-recommender-system.ipynb: Jupyter Notebook containing the data preprocessing pipeline, vectorization, and model generation.
app.py: The main Streamlit application script.
tmdb_5000_movies.csv: Metadata dataset (budget, overview, popularity, etc.).
tmdb_5000_credits.csv: Credits dataset (cast, crew).
movie.pkl: (Generated) Pickled dataframe containing movie titles and tags.
similarity.pkl: (Generated) Pickled cosine similarity matrix.

⚙️ How It Works

1. Data Pipeline (`movie-recommender-system.ipynb`)

Merging: The movies and credits datasets are merged on the movie title.
Feature Extraction:
- Genres & Keywords: Extracted from JSON format.
- Cast: Top 3 actors are extracted.
- Crew: The Director is isolated.
Text Cleaning: Spaces are removed from names (e.g., "Sam Worthington" becomes "SamWorthington") to create unique vector tokens.
Vectorization: A tags column is created by combining the overview, genres, keywords, cast, and crew. This text is stemmed and vectorized using a Bag-of-Words approach (5000 most frequent words).
Model Export: The resulting dataframe and similarity matrix are exported as .pkl files for the app to use.

2. The Application (`app.py`)

The app loads the pre-trained models and provides a dropdown menu for movie selection. When the "Recommend" button is clicked, the system:

Finds the index of the selected movie.
Retrieves the 5 most similar movies based on the cosine similarity matrix.
Fetches poster URLs using the TMDB API.
Displays the titles and posters in a 5-column grid.

🔧 Setup & Installation

Clone the repository:

git clone [https://github.com/yourusername/movie-recommender-system.git](https://github.com/yourusername/movie-recommender-system.git)

Install dependencies:

pip install streamlit pandas numpy scikit-learn nltk requests

Generate Models: Run the Jupyter Notebook to generate the necessary pickle files. Open movie-recommender-system.ipynb in Jupyter and run all cells. This will create movie.pkl and similarity.pkl.
Run the App:
```
streamlit run app.py
```

📝 API Configuration

The app.py file contains an authorization bearer token for the TMDB API to fetch posters.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
README.md		README.md
app.py		app.py
movie-recommender-system.ipynb		movie-recommender-system.ipynb
movie.pkl		movie.pkl
similarity.zip		similarity.zip
tmdb_5000_credits.csv		tmdb_5000_credits.csv
tmdb_5000_movies.csv		tmdb_5000_movies.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommender System

🚀 Overview

🛠️ Technologies Used

📂 Project Structure

⚙️ How It Works

1. Data Pipeline (`movie-recommender-system.ipynb`)

2. The Application (`app.py`)

🔧 Setup & Installation

📝 API Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Movie Recommender System

🚀 Overview

🛠️ Technologies Used

📂 Project Structure

⚙️ How It Works

1. Data Pipeline (movie-recommender-system.ipynb)

2. The Application (app.py)

🔧 Setup & Installation

📝 API Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Pipeline (`movie-recommender-system.ipynb`)

2. The Application (`app.py`)

Packages