PhishNet is a project focused on detecting phishing emails using machine learning(ML) models & Natural Languge Processing (NLP). It provides a full pipeline from training to deployment, including a Flask web interface and trained model files.
Recognizing Phishing in Emails by Using Natural Language Processing & Machine Learning Techniques
git clone https://github.com/Mohammed20201991/PhishNet.git
cd PhishNetThis project uses Python 2.7.18 due to dependency compatibility for certain models.
📨 Option 1: mail Environment (for model testing via script)
# Create and activate virtual environment
py -2 -m virtualenv mail
mail\Scripts\activate
# Install dependencies
pip install -r mail_requirements.txt
# List installed packages (optional)
pip list
# Run the phishing detection script
cd Code
python Phishector.py
# Make sure to use full path to the model pickle files
# Example:
# PhishNet/pickle_files/
# Save environment dependencies
pip freeze > mail_requirements.txt
# Deactivate when done
deactivate
# Create and activate keras virtual environment (if not already created)
py -2 -m virtualenv keras
keras\Scripts\activate
# Install required packages
pip install -r keras_requirements.txt
# Disable Colorama (for cleaner logs)
set FLASK_ENV=production
# Run the Flask app
cd Code
py app.py
🔁 Example CURL request for API testing
curl -X POST -H "Content-Type: application/json" ^
-d "[{\"body_noFunctionWords\": 5, \"url_noIntLinks\": 2, \"body_richness\": 0.1, \"url_noLinks\": 3, \"url_linkText\": 1}]" ^
http://127.0.0.1:5000/predict
To retrain the phishing detection model from scratch:
# Create and activate a virtual environment for training
py -2 -m virtualenv training
training\Scripts\activate
# Run the training script
python train/train_and_save_model.py
The dataset used in this project is publicly available on Kaggle:
📎 Phishing Email Dataset (SpamAssassin)
| Model | Accuracy | Precision | Recall | F1-score | ROC-AUC |
|---|---|---|---|---|---|
| Light GBM | 0.960 | 0.96 | 0.96 | 0.96 | 0.9934 |
| Gradient Boosting | 0.960 | 0.96 | 0.96 | 0.96 | 0.9924 |
| SVM | 0.932 | 0.91 | 0.92 | 0.91 | 0.9400 |
| Random Forest | 0.956 | 0.94 | 0.95 | 0.94 | 0.9894 |
| Extra Trees | 0.940 | 0.95 | 0.94 | 0.95 | 0.9923 |
| Bagging Classifier | 0.880 | 0.89 | 0.89 | 0.88 | 0.9550 |
| Naive Bayes | 0.970 | 0.96 | 0.96 | 0.96 | 0.9927 |
| Ensemble | 0.980 | 0.98 | 0.98 | 0.98 | 0.9956 |
@misc{phishnet2025,
author = {Mohammed A. S. Al-Hitawi,Ahmed Hadi Ali AL-Jumaili,Nadaim, Mohammed AlSahibly, Ali Q Saeed,Taher M. Ghazal,Yaseen Hadi Ali},
title = {PhishNet: Recognizing Phishing in Emails by Using Natural Language Processing & Machine Learning Techniques},
year = {2025},
publisher = {GitHub},
email = {al_hitawe@uofallujah.edu.iq},
Affilation = {Computer Centre University of Fallujah},
howpublished = {\url{https://github.com/Mohammed20201991/PhishNet}}
}
N. A. Mohammed et al., "Recognizing Phishing in Emails by Using Natural Language Processing & Machine Learning Techniques,"
2025 3rd International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, 2025, pp. 1-7,
doi: 10.1109/ICCR67387.2025.11292212.
keywords: {Phishing Detection;Natural Language Processing;Machine Learning;Ensemble Learning;Email Security;Classification},

