This is a Python MapReduce Framwork that works on Hadoop (HDFS) and validates a large "big data" XML dataset. It is written for a specific product schema, but the underlying framework can be adpated to any XML schema. This framework works best with the paramiko-scp MapReduce automation script that I wrote:
https://github.com/chris-relaxing/paramiko-scp
-
Notifications
You must be signed in to change notification settings - Fork 0
chris-relaxing/mapreduce-framework
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A Python MapReduce Framework for parsing and validating large XML datasets.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published