Skip to content

Latest commit

 

History

History
37 lines (29 loc) · 1.54 KB

README.md

File metadata and controls

37 lines (29 loc) · 1.54 KB

Introduction

Impact study of relationship between financial news articles and stock prices of publicly traded US companies.

Working

  • This application has 3 components, Analytics, ETL and Visualization

ETL

  1. Scrapes news articles from financial websites for targeted publicly listed US companies.
  2. Filter out noise like Hashtags, URLs, Emojis and group articles per day

Analytics

  1. Download Sentiment Word List from http://sentiwordnet.isti.cnr.it/
  2. Setup Cloudera Quickstart VM. https://www.cloudera.com/downloads/quickstart_vms/5-10.html
  3. Load ETL data into HDFS
  4. Run Sentiment analysis - MapReduce and Pig
  5. Reduce sentiment values per day for a given company.

Visualization

  1. Normalize daily sentiment values to 5 days a week
  2. Plot sentiment values against Stock prices. Price data can be obtained from Yahoo. Example, for IBM - https://finance.yahoo.com/quote/IBM/history/
  3. Run the R script to plot data

Requirements

  • Python 2.7 : To extract news headlines and Articles
  • newspaper : Python library to download news articles
  • beautifulsoup : Python library to scrap webpages
  • Cloudera Quickstart VM : Easily setup Hadoop environment.
  • MapReduce, Hive and Pig : The project was run in Cloudera Quickstart VM which ships with Hive and Pig installed
  • R: To graphically respresent sentiment analysis output

Improvements

  • All 3 Components (ETL, Analytics, Visualization) working
  • Build REST APIs for stand-alone application
  • Build User Interface for Visualization
  • Basic Sentiment Analysis Classifier