SMS Spam Detection Model Using NLP

Goal of the Model

The main objective of this project is to classify SMS messages as Spam or Ham (not spam) using various Natural Language Processing (NLP) techniques and machine learning algorithms.

Source of Data

The dataset used for this project is sourced from Kaggle.

Work Flow

Importing the required libraries
Loading the Dataset
Basic Understanding of Data
Data Preprocessing
Exploratory Data Analysis (EDA) & Text Preprocessing
Model Training, Building, and Evaluation

Key Tasks in Text Preprocessing Using Natural Language Tool Kit Library (NLTK) :

Lowercasing all text data
Tokenization: Splitting text into individual words
Removing special characters, stopwords, and punctuation
Stemming: Reducing words to their root form
Finding most common words in Ham and Spam messages
Analyzing 30 most frequent words in Ham and Spam messages

Feature Engineering in EDA:

Calculated the number of characters, words, and sentences in each message
Performed Univariate/Bivariate Analysis to explore the dataset
SMOTE technique was applied to handle the imbalance in the dataset

Modeling and Algorithms

We trained, built, and evaluated the model using the following algorithms:

Naive Bayes
Support Vector Machine
K-Nearest Neighbors
Decision Tree
Logistic Regression
Random Forest
AdaBoost
Bagging Classifier
Extra Trees Classifier
Gradient Boosting
XGBoost

Additionally, applied for improving the performance

Voting Classifier
Stacking Classifier

Model Performance Evaluation

Model Performance Observation

Cross-Validation Accuracy:
- The average accuracy from cross-validation is 99.85%, with minimal variation, which indicates high consistency in performance.
Training Accuracy:
- The model achieved a perfect 100% accuracy on the training data, showcasing its ability to learn the data effectively.
Test Accuracy:
- The test accuracy is 99.89%, confirming that the model generalizes well on unseen data.
Precision & Recall:
- Both precision and recall are 99.89%, meaning the model is highly accurate in predicting spam and non-spam messages.

Conclusion:

The model shows excellent performance with high accuracy, precision, and recall, making it a robust solution for SMS Spam Detection.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
SMS_Spam_Detection_Model...ipynb		SMS_Spam_Detection_Model...ipynb
spam.csv		spam.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SMS Spam Detection Model Using NLP

Goal of the Model

Source of Data

Work Flow

Key Tasks in Text Preprocessing Using Natural Language Tool Kit Library (NLTK) :

Feature Engineering in EDA:

Modeling and Algorithms

Model Performance Evaluation

Model Performance Observation

Conclusion:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Shubhamd1234/SMS_Spam_Detection_Model_Using_NLP

Folders and files

Latest commit

History

Repository files navigation

SMS Spam Detection Model Using NLP

Goal of the Model

Source of Data

Work Flow

Key Tasks in Text Preprocessing Using Natural Language Tool Kit Library (NLTK) :

Feature Engineering in EDA:

Modeling and Algorithms

Model Performance Evaluation

Model Performance Observation

Conclusion:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages