Skip to content

RDMA Hadoop/Spark not working with Slurm submission scripts #276

@casty8

Description

@casty8

I have configured rdma hadoop and spark by myself in an InfiniBand cluster and it works, but when I try to use the submission script magpie.sbatch-srun-spark-with-yarn-and-hdfs (just for testing hadoop by now), it allocates the nodes perfectly in slurm but doesn't work properly. ResourceManager appears on jps command but it doesn't start, showing and InfiniBand error in the resourcemanager.out error, while not showing errors in de .log file, so nodemanagers.log files show a connection problem to the resource manager node.

Seems like these scripts are not ready for this RDMA version of hadoop and spark, because I can make it work fine by myself with the conf files provided in the hadoop guide that I followed, any suggestions??

I would really appreciate any help you can provide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions