Skip to content

ShivendraSinha418/Email-Spam-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ง Email Spam Classifier

A machine learning project that classifies emails as Spam or Not Spam using Google's Word2Vec embeddings and a Random Forest classifier โ€” implemented entirely in a Jupyter Notebook.


Banner


๐Ÿง  Overview

This project explores a more context-aware approach to spam detection. Instead of using TF-IDF or Bag-of-Words, it leverages semantic word embeddings from Googleโ€™s Word2Vec model, combined with a Random Forest algorithm to classify emails.


๐Ÿš€ Features

  • Implemented in a single Jupyter Notebook
  • NLP-based text preprocessing
  • Google Word2Vec (pre-trained) for word embeddings
  • Vector averaging to represent entire emails
  • Random Forest Classifier for prediction
  • Evaluation metrics like accuracy, precision, recall, and F1-score

๐Ÿ› ๏ธ Technologies Used

  • Python 3.x
  • Jupyter Notebook
  • NLTK (text cleaning, stopword removal)
  • Gensim (Word2Vec model loading)
  • Scikit-learn (RandomForestClassifier & evaluation)

๐Ÿ” Workflow

  1. Text Preprocessing
    • Lowercase conversion
    • Tokenization
    • Stopword removal
  2. Embedding
    • Load Google's GoogleNews-vectors-negative300 Word2Vec model
    • Convert each email into a 300-d vector by averaging word vectors
  3. Classification
    • Train a Random Forest classifier
    • Use cross-validation & metrics to evaluate performance
  4. Prediction
    • Input: Raw email text
    • Output: Spam or Not Spam

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published