Sentiment Analysis Project

A comprehensive sentiment analysis system with seven different models: Multi-Layer Perceptron (MLP) Basic, MLP Enhanced, RoBERTa transformer-based, Kernel Approximation, Randomized PCA, LSTM, and LSTM-RoBERTa models. Includes a results tracking system that compares model performance.

Features

Multiple sentiment analysis models:
- Multi-Layer Perceptron (MLP) Basic: Simple neural network with TF-IDF features
- Multi-Layer Perceptron (MLP) Enhanced: Improved neural network with advanced preprocessing
- RoBERTa: Transformer-based model for state-of-the-art performance
- Kernel Approximation: Approximate RBF kernel features with linear classification
- Randomized PCA: Dimension reduction with logistic regression classifier
- LSTM: Long Short-Term Memory neural network for sequence processing
- LSTM-RoBERTa: Combined LSTM with RoBERTa embeddings for enhanced performance
Performance tracking and comparison
CSV and text output for model results

Getting Started

Prerequisites

Python 3.8+
TensorFlow 2.x
NumPy, Pandas
Scikit-learn
PyTorch
Transformers (Hugging Face)
Matplotlib

Installation

pip install -r requirements.txt

Run the following inside the python shell:

>>> import nltk
>>> nltk.download()

Then an installation window appears. Go to the 'Models' tab and select 'punkt' & 'punkt_tab' from under the 'Identifier' column. Then click Download and it will install the necessary files.

Usage

Running the Pipeline

The main pipeline can be run with different modes:

python src/run_pipeline.py --mode [MODE] [OPTIONS]

Training Models

Train All Models

To train all models at once:

python src/run_pipeline.py --mode train_all --include_all --sample_size 5000

Train Individual Models

Train specific models using their dedicated modes:

# MLP Basic model
python src/run_pipeline.py --mode train_mlp_basic --sample_size 5000

# MLP Enhanced model
python src/run_pipeline.py --mode train_mlp_enhanced --sample_size 5000

# RoBERTa model
python src/run_pipeline.py --mode train_roberta --sample_size 1000

# Kernel Approximation model
python src/run_pipeline.py --mode train_kernel --sample_size 5000

# Randomized PCA model
python src/run_pipeline.py --mode train_pca --sample_size 5000

# LSTM model
python src/run_pipeline.py --mode train_lstm --sample_size 1000

# LSTM-RoBERTa model
python src/run_pipeline.py --mode train_lstm_roberta --sample_size 1000

Testing Models

Test All Models

To test all trained models at once:

python src/run_pipeline.py --mode test_all --include_all --sample_size 1000

Test Individual Models

Test a specific model using the test mode with the model name:

# Test MLP Basic model
python src/run_pipeline.py --mode test --test_model mlp_basic --sample_size 1000

# Test MLP Enhanced model
python src/run_pipeline.py --mode test --test_model mlp_enhanced --sample_size 1000

# Test RoBERTa model
python src/run_pipeline.py --mode test --test_model roberta --sample_size 1000

# Test Kernel Approximation model
python src/run_pipeline.py --mode test --test_model kernel --sample_size 1000

# Test Randomized PCA model
python src/run_pipeline.py --mode test --test_model pca --sample_size 1000

# Test LSTM model
python src/run_pipeline.py --mode test --test_model lstm --sample_size 1000

# Test LSTM-RoBERTa model
python src/run_pipeline.py --mode test --test_model lstm_roberta --sample_size 1000

Comparing Models

To compare all models without retraining:

python src/run_pipeline.py --mode compare --include_all --skip_training

Available Modes

Mode	Description
`train_mlp_basic`	Train the MLP Basic model
`train_mlp_enhanced`	Train the MLP Enhanced model
`train_roberta`	Train the RoBERTa model
`train_kernel`	Train the Kernel Approximation model
`train_pca`	Train the Randomized PCA model
`train_lstm`	Train the LSTM model
`train_lstm_roberta`	Train the LSTM-RoBERTa model
`train_all`	Train all models and compare them
`test`	Test a specific model
`test_all`	Test all models
`compare`	Compare existing trained models

Common Options

Option	Description	Default
`--sample_size`	Number of samples to use for training	20000
`--verbose`	Verbosity level (0=silent, 1=progress bar, 2=one line per epoch)	1
`--test_model`	Model to test when in 'test' mode (mlp_basic, mlp_enhanced, roberta, kernel, pca, lstm, lstm_roberta)	Required
`--include_roberta`	Include RoBERTa in the comparison (for 'compare' mode)	False
`--include_all`	Include all models in the comparison (for 'compare' mode)	False
`--timeout`	Timeout in seconds for each model script	1800
`--skip_training`	Skip model training and compare existing results	False

Individual Scripts

You can also run the individual scripts directly:

python src/scripts/mlp_basic_main.py --sample_size 10000
python src/scripts/mlp_enhanced_main.py --sample_size 10000
python src/scripts/roberta_main.py --sample_size 1000
python src/scripts/kernel_approximation_main.py --sample_size 10000
python src/scripts/randomized_pca_main.py --sample_size 10000
python src/scripts/lstm_main.py --mode train --sample_size 1000
python src/scripts/lstm_roberta_main.py --mode train --sample_size 1000
python src/scripts/compare_models.py --include_all --skip_training

Results

After running the models, results are stored in:

src/model_results.json (raw data)
src/model_results.csv (CSV format for easy loading into pandas)
src/best_model_summary.txt (text summary of the best model)

The models themselves are stored in the src/models/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
papers		papers
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis Project

Features

Getting Started

Prerequisites

Installation

Usage

Running the Pipeline

Training Models

Train All Models

Train Individual Models

Testing Models

Test All Models

Test Individual Models

Comparing Models

Available Modes

Common Options

Individual Scripts

Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Project

Features

Getting Started

Prerequisites

Installation

Usage

Running the Pipeline

Training Models

Train All Models

Train Individual Models

Testing Models

Test All Models

Test Individual Models

Comparing Models

Available Modes

Common Options

Individual Scripts

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages