Text Analytics of Twitter Data
This code shows the general way of doing text analytics. The process are
- Tokens
- DFM: Document Frequency Matrix
- TF-IDF
- SVD: Singular Value Decomposition
- Random Forest
- Prediction -Confusion Matrix
The output accuracy of the test data having sample size of 39 was 62.16 %, that has Sensitivity : 0.4667
and Specificity : 0.7273. As you all can guess, the model isn't perfect. However, it gives some basic idea about the text analysis.
The data is randomly picked subset from the Kaggle dataset "Real or Not? NLP with Disaster Tweets". Due to lack of computational power, I chose random 200 data points.
Thanks to the YouTube videos by Data Science Dojo: Introduction to Text Analytics in R.
The data set link is : https://www.kaggle.com/c/nlp-getting-started The YouTube link for DS Dojo is : https://www.youtube.com/playlist?list=PL8eNk_zTBST8olxIRFoo0YeXxEOkYdoxi
Thank You