Skip to content

zi78494umbcedu/SentimentAnalysisKafka

Repository files navigation

SentimentAnalysisKafka

A Data Intensive Application to analyze sentiments for positive/negative/neutral of Twitter APIs

This project is a good starting point for those who have little or no experience with Kafka & Apache Spark Streaming. Screenshot 2023-12-03 at 11 44 45 PM

Input data: Tweets with a company names Main model: Data Intensive application that can scale and run efficiently with data models and encoding schemes. Preprocessing and apply sentiment analysis on the tweets Output: Text with all the tweets and their sentiment analysis and competitor names

We use Python version 3.11 and Spark version 3.5.0 and Kafka 3.6.0.

Part 1: Ingest Data using Kafka

This part is about sending tweets from Twitter Sentiment Analysis data. To do this, follow the instructions about the ingestion of Data using Kafka.

Part 2: Tweet preprocessing and sentiment analysis

In this part, we receive tweets from Kafka and preprocess them with the pyspark library which is python's API for spark. We then apply sentiment analysis using textblob; A python's library for processing textual Data. And have competitors names extracted using spacy library.

After sentiment analysis, we write the sentiment analysis and competitor names in the dashboard using Flask applicaiton. We have also the possibility to store in a parquet file, which is a data storage format.

Screenshot 2023-12-03 at 11 45 05 PM Screenshot 2023-12-03 at 11 45 18 PM

Part 3: Data Warehouse (Snowflake) and OLAP (Online Analytical Processing)

A Single source of truth and an integrated, non-volatile store for historical streamed data by Spark for complex querying helping analysts for Daily/Monthly/Quarterly Summaries Example1: Average Sentiment Score per Company Screenshot 2023-12-08 at 9 20 52 PM

Screenshot 2023-12-08 at 9 21 03 PM

Example2: Top Competitors by Mention Count

Screenshot 2023-12-08 at 9 22 04 PM

Example3: Sentiment Trend Over Time/Daily Summary

Screenshot 2023-12-08 at 9 22 22 PM

About

A Data Intensive Application to analyze sentiments for positive/negative/neutral of Twitter APIs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published