Skip to content

namitmohale/StockCollector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Impact study of relationship between financial news articles and stock prices of publicly traded US companies.

Working

  • This application has 3 components, Analytics, ETL and Visualization

ETL

  1. Scrapes news articles from financial websites for targeted publicly listed US companies.
  2. Filter out noise like Hashtags, URLs, Emojis and group articles per day

Analytics

  1. Download Sentiment Word List from http://sentiwordnet.isti.cnr.it/
  2. Setup Cloudera Quickstart VM. https://www.cloudera.com/downloads/quickstart_vms/5-10.html
  3. Load ETL data into HDFS
  4. Run Sentiment analysis - MapReduce and Pig
  5. Reduce sentiment values per day for a given company.

Visualization

  1. Normalize daily sentiment values to 5 days a week
  2. Plot sentiment values against Stock prices. Price data can be obtained from Yahoo. Example, for IBM - https://finance.yahoo.com/quote/IBM/history/
  3. Run the R script to plot data

Requirements

  • Python 2.7 : To extract news headlines and Articles
  • newspaper : Python library to download news articles
  • beautifulsoup : Python library to scrap webpages
  • Cloudera Quickstart VM : Easily setup Hadoop environment.
  • MapReduce, Hive and Pig : The project was run in Cloudera Quickstart VM which ships with Hive and Pig installed
  • R: To graphically respresent sentiment analysis output

Improvements

  • All 3 Components (ETL, Analytics, Visualization) working
  • Build REST APIs for stand-alone application
  • Build User Interface for Visualization
  • Basic Sentiment Analysis Classifier

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published