This project focuses on building a machine learning-based spam detection system to classify SMS messages as either spam or ham (legitimate). The system leverages natural language processing (NLP) techniques for feature extraction and several machine learning models for classification. The models evaluated in this project include:
- Naive Bayes
- K-Nearest Neighbors (KNN)
- Decision Tree
- Support Vector Machine (SVM)
- Random Forest
The dataset used for this project contains labeled SMS messages. Each message is categorized as either 'ham' (legitimate) or 'spam'. The data is preprocessed by tokenizing the text and removing stopwords, and features are extracted using n-grams.
The primary goal of this project is to:
- Preprocess the dataset and extract features from the SMS text.
- Build and evaluate several machine learning models for spam detection.
- Visualize the results and model performance using metrics like accuracy, confusion matrix, and ROC curve.