🎬 Movie Recommendation System

A comprehensive movie recommendation system that combines Content-Based Filtering and Genre-Based Filtering with Sentiment Analysis for user reviews. The system features a modern web interface built with Flask and includes advanced machine learning algorithms for personalized movie suggestions.

🎯 System Overview

Core Functionality

Content-Based Movie Recommendations: Uses TF-IDF vectorization and cosine similarity
Genre-Based Filtering: Weighted scoring system with year filtering
Sentiment Analysis: ML-powered review classification (Good/Bad)
Real-time Web Scraping: Live data from IMDB for reviews and metadata
Auto-complete Search: Smart movie title suggestions
Responsive Web Interface: Modern UI with AJAX-powered interactions

2025-08-17-22-03-47.mp4

Objectives Achieved

✅ Movie recommendations based on title input
✅ Genre and year-based filtering
✅ Sentiment analysis of user reviews
✅ Content-based filtering with TF-IDF
✅ Collaborative filtering analysis (Jupyter Notebook)
✅ Neural Network Matrix Factorization (Jupyter Notebook)

To create a movie recommendation system using Collaborative Filtering and machine learning algorithms such as K Nearest Neighbours.
The system should recommend movies based on the movie title entered by the user.
The system should also be able to recommend movies on the basis of 'genre only' and 'genre and year' entered.
The system should apply sentiment analysis to categorize user comments on a particular movie.
Additional Content Based Filtering is performed (can be seen here) using Neural Network to perform Matrix Factorization.

🏗️ Technical Architecture

Backend Python 3.13 + Flask 3.0.0 (Web Framework) Scikit-learn 1.7.1 (ML: TF-IDF, Cosine Similarity, Naive Bayes) Pandas NumPy 2.3.2 + Pandas 2.3.1 (Data Processing) BeautifulSoup4 4.12.2 + LXML 6.0.0 (Web Scraping) Pickle (Model Serialization)

Frontend HTML5/CSS3 + JavaScript ES6+ Bootstrap 4.x (Responsive UI) jQuery 3.x (AJAX, DOM Manipulation) AutoComplete.js 7.2.0 (Smart Search)

Data & APIs MovieLens Dataset (Primary Data) TMDB API (Movie Metadata) IMDB (Web Scraping for Reviews) CSV/JSON (Data Storage)

System Components

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Web Interface │    │ Recommendation  │    │ Data Processing │
│   (Flask App)   │◄──►│    Engine       │◄──►│    Pipeline     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │ ML Models       │    │ External APIs   │
│ (HTML/CSS/JS)   │    │ (Pickle Files)  │    │ (TMDB/IMDB)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Flow Diagram

Data Flow

User Input → Flask Routes → Recommendation Engine
Content Processing → TF-IDF Vectorization → Similarity Matrix
Genre Filtering → Weighted Scoring → Top-N Results
Web Scraping → IMDB Reviews → Sentiment Analysis
Results Rendering → Template Engine → User Interface

🧠 Algorithms & Implementation

1. Content-Based Filtering (Primary Algorithm)

Location: main.py lines 18-50

def create_similarity():
    data = pd.read_csv('main_data.csv')
    cv = CountVectorizer()
    count_matrix = cv.fit_transform(data['comb'])
    similarity = cosine_similarity(count_matrix)
    return data, similarity

Algorithm Details:

Vectorization: TF-IDF (Term Frequency-Inverse Document Frequency)
Similarity Metric: Cosine Similarity
Feature Combination: Movie metadata concatenated in 'comb' column
Recommendation Logic: Top-10 most similar movies

Mathematical Foundation:

Cosine Similarity = (A · B) / (||A|| × ||B||)
where A, B are TF-IDF vectors of movie features

2. Genre-Based Filtering

Location: main.py lines 70-85

def best_movies_by_genre(genre, top_n, year=1920):
    movie_score = pd.read_csv('movie_score.csv')
    movie_score['year'] = movie_score['title'].apply(lambda _: int(_[-5:-1]))
    # Case-insensitive genre matching
    # Weighted scoring by rating and count

Algorithm Details:

Weighted Scoring: weighted_score = (count * mean) / (count + minimum_required)
Year Filtering: Movies from specified year onwards
Case-Insensitive Matching: Robust genre name handling
Available Genres: 19 genres including Action, Adventure, Comedy, Drama, etc.

3. Sentiment Analysis

Location: main.py lines 175-190

# Pre-trained model loading
clf = pickle.load(open('nlp_model.pkl', 'rb'))
vectorizer = pickle.load(open('tranform.pkl', 'rb'))

# Prediction pipeline
movie_vector = vectorizer.transform(movie_review_list)
pred = clf.predict(movie_vector)
reviews_status.append('Good' if pred else 'Bad')

Model Details:

Algorithm: Multinomial Naive Bayes
Features: TF-IDF vectorized text
Classes: Binary (Good/Bad)
Training Data: IMDB review dataset
Accuracy: ~85% (based on model performance)

4. Collaborative Filtering (Jupyter Notebook)

Location: Recommovie_9604_Notebook.ipynb

Implemented Algorithms:

K-Nearest Neighbors: User-based collaborative filtering
Matrix Factorization: SVD (Singular Value Decomposition)
Neural Network Matrix Factorization: Custom neural network implementation

# K-NN Implementation
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(movie_wide)

# SVD Implementation
u, s, vt = svds(train_data_matrix, k=latent_features)
X_pred = np.dot(np.dot(u, s_diag_matrix), vt)

🔄 Data Processing Pipeline

1. Data Sources

Primary Dataset: main_data.csv (6,012 movies, 1M+ ratings)
External APIs: TMDB for movie metadata
Web Scraping: IMDB for user reviews

2. Feature Engineering

# Movie feature combination
data['comb'] = data['movie_title'] + ' ' + data['cast'] + ' ' + data['director'] + ' ' + data['genres']

3. Data Preprocessing

Text Cleaning: Remove special characters, normalize case
Missing Value Handling: Drop or impute based on context
Feature Selection: Extract year from title, create genre indicators

4. Model Serialization

# Save trained models
pickle.dump(clf, open('nlp_model.pkl', 'wb'))
pickle.dump(vectorizer, open('tranform.pkl', 'wb'))

�� API Endpoints

Core Routes

Endpoint	Method	Purpose	Parameters
`/` or `/home`	GET	Main application page	None
`/similarity`	POST	Get movie similarity scores	`name` (movie title)
`/recommend`	POST	Generate recommendations	Multiple form fields
`/genres`	GET	Genre selection page	None
`/genre`	POST	Genre-based recommendations	`Genre`, `Year`

Request/Response Examples

Similarity Endpoint:

// Request
$.post('/similarity', {name: 'The Matrix'})

// Response
"Terminator 2: Judgment Day---The Matrix Reloaded---..."

Recommendation Endpoint:

// Request
$.post('/recommend', {
    title: 'The Matrix',
    cast_ids: '[1,2,3]',
    // ... other fields
})

// Response
// Rendered HTML template with movie details

🛠️ Installation & Setup

Prerequisites

Python 3.13+
pip package manager
Virtual environment (recommended)

Step-by-Step Installation

Clone Repository

git clone <repository-url>
cd Movie-Recommendation-System

Create Virtual Environment

python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux

Install Dependencies
```
pip install -r requirements.txt
```

Verify Installation

python -c "import flask, sklearn, pandas, numpy; print('All packages installed successfully')"

Run Application
```
python main.py
```
Access Application
- Open browser: http://127.0.0.1:5000
- Debug mode enabled by default

📖 Usage Guide