Skip to content

This the implementation of spark in Java, to exercise spark APIs, in order to find the top N most occured words in a bigData file

Notifications You must be signed in to change notification settings

kamalchaturvedi/SparkTopNMostOccuredWords

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkTopNMostOccuredWords

This the implementation of Apache Spark in Java, in order to find the top N most occured words in a bigData file, by exercising the amazing spark APIs. There is detailed explaination of what task each line in code performs, in the Java file.

Build Jar : You can build the jar with the below mentioned command

maven clean install

Input : You can run the spark job in a stand-alone node by the command

spark-submit --class "com.kamal.SparkWordCount.SparkWordCountApplication" --master local[2] "./SparkWordCount.jar" ./big.txt ./output 10

Where you can replace

./big.txt with the input file path

./output with the output file path &

10 with the value of N (For the usecase : Top N most occured words)

About

This the implementation of spark in Java, to exercise spark APIs, in order to find the top N most occured words in a bigData file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages