(a) Spam Ham Detection: This is a text file with 5572 records. In this case study, I am trying to predict weather the future text will be under ham or spam using open source python library NLTK (Natural Language Tool Kit) and other different modules. Reprocessing steps used here are: -- Stopwords -- Stemming -- Lemmatization
Implemented machine learning algorithms for the detection of spam of email: -- Count Vectorizer (Tfidf Vectorizer) -- Multinomial Naive Bayes classifier -- Logistic Regression
(b) Text_Mining: I started with a group of small texts and applied Count Vectorizer with stopwords and stemmer for the text mining purpose.