This project focuses on classifying emails as spam or not spam using Natural Language Processing (NLP) techniques. The repository includes code for training models, evaluating their performance, and a dataset for experimentation.
models/: Directory containing trained models.src/: Directory containing source filesFFNN.py: Script for training a Feedforward Neural Network.main.py: Main script for running the classification.utils.py: Utility functions for data processing and model evaluation.
data/: Directory containing the data setOppositional_thinking_analysis_dataset.json: Dataset used for training and evaluation.
desc/: Directory containing the description of the projectNLP - Project 1_4.pdf: Project documentation and analysis.
- Python 3.x
- Necessary libraries listed in
requirements.txt
- Clone the repository:
git clone https://github.com/damlakayikci/Spam-Email-Classification-NLP.git cd src
Use one of the following scripts to run the code
-
To train and run the Naive Bayes model:
python main.py nb -
To train and run the Feedforward Neural Network (FFNN):
python main.py ffnn -
To print statistics and plot graphs:
python main.py stats -
To find the Pointwise Mutual Information (PMI) of 10 random words and print the most similar words:
python main.py pmi