Sentiment Analysis with LSTM

This project implements a Sentiment Analysis model using LSTM (Long Short-Term Memory) networks on the IMDB movie reviews dataset.

📌 Project Details

Objective: Classify movie reviews as positive or negative.
Tech Stack: Python, TensorFlow/Keras, Pandas, NumPy, Matplotlib, Scikit-learn.
Approach:
- Preprocess the text data.
- Tokenize and pad sequences.
- Train an LSTM neural network to learn sentiment patterns.
- Evaluate the model performance.

Name: IMDB Dataset of 50K Movie Reviews.
Source: Kaggle
Description:
- 50,000 reviews labeled as positive or negative.
- Balanced dataset (25,000 positive / 25,000 negative).

The model was trained for 5 epochs with the following performance:

Epoch 1: Accuracy = 51.48%, Loss = 0.6938, Val Accuracy = 54.51%, Val Loss = 0.6889
Epoch 2: Accuracy = 56.55%, Loss = 0.6736, Val Accuracy = 53.46%, Val Loss = 0.7004
Epoch 3: Accuracy = 56.17%, Loss = 0.6803, Val Accuracy = 58.44%, Val Loss = 0.6674
Epoch 4: Accuracy = 59.64%, Loss = 0.6568, Val Accuracy = 60.69%, Val Loss = 0.6646
Epoch 5: Accuracy = 72.97%, Loss = 0.5580, Val Accuracy = 82.66%, Val Loss = 0.4459 s Final Test Accuracy: 82.66%

Data Loading: Import the dataset into Pandas DataFrame.
Preprocessing:
- Remove HTML tags and special characters.
- Convert text to lowercase.
- Map labels: positive → 1, negative → 0.
Tokenization & Padding:
- Convert words into integer sequences using Keras Tokenizer.
- Pad sequences to ensure each review has the same length (200).
Model Building:
- Use Embedding + LSTM + Dense layers.
Training:
- Optimizer: Adam
- Loss: Binary Crossentropy
- Epochs: 5
- Batch Size: 128
Evaluation:
- Measure accuracy on test set.
- Plot training vs validation accuracy.
Prediction: Test the model with custom reviews.