First run the merge_data.py python3 merge_data.py using this config in intellij ../merged_data.json ../sample_hadoop/ The sample_hadoop is only for wiki_1.json