This notebook-based project demonstrates how to detect spam emails using Natural Language Processing and classic Machine Learning algorithms.
💡 Built entirely in a Jupyter Notebook using
email.csvas the dataset.
- Exploratory Data Analysis (EDA) of spam vs. ham messages
- Data cleaning and duplicate removal
- Feature engineering: character count, word count, sentence count
- Text preprocessing using NLTK (tokenization, stopword removal, stemming)
- Label encoding (Spam = 0, Ham = 1)
- Vectorization using CountVectorizer and TF-IDF
- Model training with:
- Logistic Regression
- Support Vector Machine (SVM)
- Random Forest
- Decision Trees
- Naive Bayes
- AdaBoost, Bagging, Gradient Boosting
- Model comparison using accuracy, classification report, and confusion matrix
- Python 3.x
- pandas, numpy
- nltk
- matplotlib, seaborn
- scikit-learn
- Open the notebook in Jupyter or Google Colab
- Make sure
email.csvis present in the same directory - Run the notebook cells step-by-step
- You'll see preprocessing, training, and evaluation all inside one file
Spam_Email_Detection/
├── Spam_Email_Detection.ipynb
└── email.csv
NLP · Spam Classification · Email Filtering · TF-IDF · CountVectorizer · Scikit-learn · NLTK · Logistic Regression · Random Forest · Text Preprocessing · Model Evaluation · Python
All processing and training steps are performed inside the Jupyter Notebook itself — no external scripts or setup required.