Skip to content

Conversation

@gregbaker
Copy link

This adds Spark 1.4.0 to the cluster setup. I have tested it a little: spark jobs can access HDFS files (as hdfs://master.local:9000/home/vagrant/...) and jobs can be sent out to the cluster with a command like this:

spark-submit --master yarn-cluster ...

The download required during the provisioning is about 240MB: I don't know if that's enough to make you think that leaving the spark manifest commented out in manifests/master-single.pp is wise.

I haven't updated the README: again, I'm not sure if it's worth advertising there.

@gregbaker
Copy link
Author

I have continued to add changes to my fork: fiddled with the HDFS replication (so files aren't available on every node, which is realistic) and updated version of the tools (to Hadoop 2.7.1 and other current versions). Certainly feel free to cherry-pick as necessary if these aren't considered relevant to this project's goals.

@tristanreid
Copy link

Looks cool! I may fork off this to add parquet-tools (https://github.com/Parquet/parquet-mr/tree/master/parquet-tools)

@tristanreid
Copy link

Greg, this is really great! One thing: hbase has moved from 1.1.1->1.1.2. The build only works for me if I make that change in modules/hbase/manifests/init.pp and modules/phoenix/manifests/init.pp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants