This repo contains files for a tweet sentiment analysis. I performed preliminary data exploration, data visualization, as well as data cleaning to prepare data gathered from a [kaggle competition] (https://www.kaggle.com/c/twitter-sentiment-analysis2/data) to use for training a classifier.
Trained a Logistic Regression classifier with various embeddings for words.
Bag of Words Accuracy ~ 74%
Tfidf Accuracy ~ 75%
Doc2Vec Accuracy ~ TBD
I hope to expand this mini-project by using more sophisticated methods for generating word embeddings such as Doc2Vec and to use the Twitter API to scrape tweets to use as a demo in the classifier!