Conair

Description

For my CS598 Cloud Computing Capstone we were tasked with creating data pipelines one using a batch-style of processing and another using stream-style processing. This is my batch style processing project Conair. I was tasked with architecing out a big data processing solution to answer a multitode of questions using airline data between the years of 1988 - 2008.

I decided to use Apache Hive on AWS EMR as my main application. The benefits for me using hive was that I could use HQL their dialect for SQL which allows me to abstract away a lot of the finer details of map-reduce and focus on the core logic.

The main applications that I used for this project was EMR, Hadoop, Hive, DynamoDB, S3, EMRFS, and AWS Datapipeline for orchestration. I talked about my entire process from extracting and cleaning of the data to optimizations that I used to speed up queries in my report.

Read The Report

Watch The Video

For extracting and data cleaning please take a look at some of my handyScripts.

Follow along as I tried different optimizations and configurations in my notes and random commands. I also took screenshots as I tried out different optimizations in Hive when querying the tables.

I also wrote go cli scripts to query the DynamoDB database for group 2 and 3.2 questions.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
group1		group1
group2		group2
group3		group3
handyScripts		handyScripts
img		img
report		report
README.md		README.md
config.json		config.json
createtable.hql		createtable.hql
notes.md		notes.md
randomcmds.txt		randomcmds.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Conair

Description

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Tucker459/conair

Folders and files

Latest commit

History

Repository files navigation

Conair

Description

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages