Following are the resources and references used to build this project.
- LogFile Generator: https://github.com/0x1DOCD00D/LogFileGenerator
- Tutorial on MapReduce - https://hortonworks.com/wp-content/uploads/2012/03/Tutorial_Hadoop_HDFS_MapReduce.pdf
- Sharding data for Hadoop - https://www.mongodb.com/blog/post/in-the-loop-with-hadoop-take-advantage-of-data-locality-with-mongo-hadoop
- Educative.io Course on Hadoop Map-Reduce - https://www.educative.io/courses/introduction-to-big-data-and-hadoop
- WordCount Example in Scala: https://dzone.com/articles/wordcount-on-hadoop-with-scala-emmanouil-gkatziour
- Apache Java Example: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
Hadoop MapReduce CodeWalk Through:
- Code walkthrough: https://docs.cloudera.com/documentation/other/tutorial/CDH5/topics/ht_wordcount1_source.html
- https://docs.cloudera.com/documentation/other/tutorial/CDH5/topics/ht_usage.html
- https://docs.cloudera.com/documentation/other/tutorial/CDH5/topics/ht_wordcount2.html
Creating JAR's:
- https://www.baeldung.com/scala/sbt-fat-jar
- https://dzone.com/articles/wordcount-on-hadoop-with-scala-emmanouil-gkatziour
- https://www.datasciencecentral.com/profiles/blogs/how-to-install-and-run-hadoop-on-windows-for-beginners
HDFS:
- Guide to creating Hadoop FS setup: https://docs.cloudera.com/documentation/other/tutorial/CDH5/topics/ht_usage.html
- Hortonworks Sandbox v 3.0.1 - https://www.educative.io/courses/introduction-to-big-data-and-hadoop