-
Notifications
You must be signed in to change notification settings - Fork 52
Description
I have configured rdma hadoop and spark by myself in an InfiniBand cluster and it works, but when I try to use the submission script magpie.sbatch-srun-spark-with-yarn-and-hdfs (just for testing hadoop by now), it allocates the nodes perfectly in slurm but doesn't work properly. ResourceManager appears on jps command but it doesn't start, showing and InfiniBand error in the resourcemanager.out error, while not showing errors in de .log file, so nodemanagers.log files show a connection problem to the resource manager node.
Seems like these scripts are not ready for this RDMA version of hadoop and spark, because I can make it work fine by myself with the conf files provided in the hadoop guide that I followed, any suggestions??
I would really appreciate any help you can provide.