This project implements a Sentiment Analysis model using LSTM (Long Short-Term Memory) networks on the IMDB movie reviews dataset.
- Objective: Classify movie reviews as positive or negative.
- Tech Stack: Python, TensorFlow/Keras, Pandas, NumPy, Matplotlib, Scikit-learn.
- Approach:
- Preprocess the text data.
- Tokenize and pad sequences.
- Train an LSTM neural network to learn sentiment patterns.
- Evaluate the model performance.
- Name: IMDB Dataset of 50K Movie Reviews.
- Source: Kaggle
- Description:
- 50,000 reviews labeled as positive or negative.
- Balanced dataset (25,000 positive / 25,000 negative).
The model was trained for 5 epochs with the following performance:
- Epoch 1: Accuracy = 51.48%, Loss = 0.6938, Val Accuracy = 54.51%, Val Loss = 0.6889
- Epoch 2: Accuracy = 56.55%, Loss = 0.6736, Val Accuracy = 53.46%, Val Loss = 0.7004
- Epoch 3: Accuracy = 56.17%, Loss = 0.6803, Val Accuracy = 58.44%, Val Loss = 0.6674
- Epoch 4: Accuracy = 59.64%, Loss = 0.6568, Val Accuracy = 60.69%, Val Loss = 0.6646
- Epoch 5: Accuracy = 72.97%, Loss = 0.5580, Val Accuracy = 82.66%, Val Loss = 0.4459 s Final Test Accuracy: 82.66%
- Data Loading: Import the dataset into Pandas DataFrame.
- Preprocessing:
- Remove HTML tags and special characters.
- Convert text to lowercase.
- Map labels: positive β 1, negative β 0.
- Tokenization & Padding:
- Convert words into integer sequences using Keras Tokenizer.
- Pad sequences to ensure each review has the same length (200).
- Model Building:
- Use Embedding + LSTM + Dense layers.
- Training:
- Optimizer: Adam
- Loss: Binary Crossentropy
- Epochs: 5
- Batch Size: 128
- Evaluation:
- Measure accuracy on test set.
- Plot training vs validation accuracy.
- Prediction: Test the model with custom reviews.
- Embedding Layer: Input dim = 10,000, Output dim = 64, Input length = 200
- LSTM Layer: 128 units
- Dropout Layer: 0.5
- Dense Layer: 64 units, ReLU activation
- Dropout Layer: 0.3
- Output Layer: 1 unit, Sigmoid activation
- Clone or download this repository.
- Download the dataset and place
IMDB Dataset.csvin the project folder. - Install dependencies:
pip install -r requirements.txt - Run the script:
python main.py
- Use pre-trained embeddings like GloVe.
- Try BiLSTM/GRU architectures.
- Experiment with transformers (BERT) for better accuracy.
- Deploy the model as a web app using Flask or Streamlit.