GitHub - kamalchaturvedi/SparkTopNMostOccuredWords: This the implementation of spark in Java, to exercise spark APIs, in order to find the top N most occured words in a bigData file

SparkTopNMostOccuredWords

This the implementation of Apache Spark in Java, in order to find the top N most occured words in a bigData file, by exercising the amazing spark APIs. There is detailed explaination of what task each line in code performs, in the Java file.

Build Jar : You can build the jar with the below mentioned command

maven clean install

Input : You can run the spark job in a stand-alone node by the command

spark-submit --class "com.kamal.SparkWordCount.SparkWordCountApplication" --master local[2] "./SparkWordCount.jar" ./big.txt ./output 10

Where you can replace

./big.txt with the input file path

./output with the output file path &

10 with the value of N (For the usecase : Top N most occured words)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.mvn/wrapper		.mvn/wrapper
src/main/java/com/kamal/sparkTopN		src/main/java/com/kamal/sparkTopN
.gitignore		.gitignore
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

kamalchaturvedi/SparkTopNMostOccuredWords

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages