Skip to content

Latest commit

 

History

History
65 lines (43 loc) · 1.67 KB

File metadata and controls

65 lines (43 loc) · 1.67 KB

📧 Spam Email Detection using NLP & Machine Learning

This notebook-based project demonstrates how to detect spam emails using Natural Language Processing and classic Machine Learning algorithms.

💡 Built entirely in a Jupyter Notebook using email.csv as the dataset.


🔍 What’s Inside

  • Exploratory Data Analysis (EDA) of spam vs. ham messages
  • Data cleaning and duplicate removal
  • Feature engineering: character count, word count, sentence count
  • Text preprocessing using NLTK (tokenization, stopword removal, stemming)
  • Label encoding (Spam = 0, Ham = 1)
  • Vectorization using CountVectorizer and TF-IDF
  • Model training with:
    • Logistic Regression
    • Support Vector Machine (SVM)
    • Random Forest
    • Decision Trees
    • Naive Bayes
    • AdaBoost, Bagging, Gradient Boosting
  • Model comparison using accuracy, classification report, and confusion matrix

🛠 Libraries Used

  • Python 3.x
  • pandas, numpy
  • nltk
  • matplotlib, seaborn
  • scikit-learn

🧪 How to Run

  1. Open the notebook in Jupyter or Google Colab
  2. Make sure email.csv is present in the same directory
  3. Run the notebook cells step-by-step
  4. You'll see preprocessing, training, and evaluation all inside one file

📁 Files

Spam_Email_Detection/

├── Spam_Email_Detection.ipynb

└── email.csv


🔑 Keywords

NLP · Spam Classification · Email Filtering · TF-IDF · CountVectorizer · Scikit-learn · NLTK · Logistic Regression · Random Forest · Text Preprocessing · Model Evaluation · Python


📌 Note

All processing and training steps are performed inside the Jupyter Notebook itself — no external scripts or setup required.