Impact study of relationship between financial news articles and stock prices of publicly traded US companies.
- This application has 3 components, Analytics, ETL and Visualization
ETL
- Scrapes news articles from financial websites for targeted publicly listed US companies.
- Filter out noise like Hashtags, URLs, Emojis and group articles per day
Analytics
- Download Sentiment Word List from http://sentiwordnet.isti.cnr.it/
- Setup Cloudera Quickstart VM. https://www.cloudera.com/downloads/quickstart_vms/5-10.html
- Load ETL data into HDFS
- Run Sentiment analysis - MapReduce and Pig
- Reduce sentiment values per day for a given company.
Visualization
- Normalize daily sentiment values to 5 days a week
- Plot sentiment values against Stock prices. Price data can be obtained from Yahoo. Example, for IBM - https://finance.yahoo.com/quote/IBM/history/
- Run the R script to plot data
- Python 2.7 : To extract news headlines and Articles
- newspaper : Python library to download news articles
- beautifulsoup : Python library to scrap webpages
- Cloudera Quickstart VM : Easily setup Hadoop environment.
- MapReduce, Hive and Pig : The project was run in Cloudera Quickstart VM which ships with Hive and Pig installed
- R: To graphically respresent sentiment analysis output
- All 3 Components (ETL, Analytics, Visualization) working
- Build REST APIs for stand-alone application
- Build User Interface for Visualization
- Basic Sentiment Analysis Classifier