Skip to content

Slurm nodes installation under Ubuntu16.04#2

Open
jianlianggao wants to merge 9 commits intoBioMedIA:masterfrom
jianlianggao:feature/ubuntu16.04
Open

Slurm nodes installation under Ubuntu16.04#2
jianlianggao wants to merge 9 commits intoBioMedIA:masterfrom
jianlianggao:feature/ubuntu16.04

Conversation

@jianlianggao
Copy link

The files can be used for installing Slurm nodes with Ubuntu 16.04. The Slurm master node is biomedia03

The files can be used for installing Slurm nodes with Ubuntu 16.04. The Slurm master node is biomedia03
Copy link
Member

@jopasserat jopasserat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of changes, most critical are:

  • use Jinja templates
  • use Pillar

{% set rocsList = [] %}
{% for node, values in pillar['slurm']['nodes']['batch']['cpus'].items() %} {% if node.startswith('roc') %} {% set rocsListTrash = rocsList.append(node) %} {% endif %} {% endfor %}
NodeName=biomedia01 RealMemory=64000 CPUs=24 State=UNKNOWN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be part of the Jinja template

NodeName=monal01 RealMemory=80000 CPUs=12 Gres=gpu:8 State=UNKNOWN

# Partitions
PartitionName=long Nodes=biomedia01,biomedia02,biomedia05,biomedia06,biomedia07,biomedia08,biomedia09,biomedia10,roc01,roc02,roc03 Default=YES MaxTime=43200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all roc machines should be in long as well. it's fine to have two partitions overlapping

#AuthInfo=/var/run/munge/munge.socket.2
AuthType=auth/munge
DbdHost={{ pillar['slurm']['controller'] }}
DbdHost=biomedia03
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no hardcoded values please => use Pillar

{% endif %}
install slurm packages from local repo:
pkg.installed:
- sources:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

# disable SSH for anybody but root
+:root:ALL
-:ALL EXCEPT (csg) dr jpassera bglocker:ALL
-:ALL EXCEPT (csg) (biomedia) dr jpassera bglocker jgao:ALL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regular users shouldn't have SSH access to the cluster nodes, hence the previous config

ConstrainSwapSpace=yes
AllowedSwapSpace=10.0
# Not well supported until Slurm v14.11.4 https://groups.google.com/d/msg/slurm-devel/oKAUed7AETs/Eb6thh9Lc0YJ
#ConstrainDevices=yes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be enabled and not commented out then?

#
# Workaround because Slurm does not recognize full hostname...
ControlMachine={{ pillar['slurm']['controller'] }}
ControlMachine=biomedia03
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pillar

@@ -1,3 +0,0 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is useful to add new members. it should stay there

@@ -1,13 +0,0 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script used to bootstrap new minions. Any reason to remove it?

* Slurm
* Slurm Database
* SSH
To install Slurm nodes, you need to copy (on Slurm mater node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move that lower in an "Instructions" subsection instead of replacing what the formula contains?

@jianlianggao
Copy link
Author

jianlianggao commented Apr 30, 2018 via email

PartitionName=rocsShort Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=60 Priority=5000
#PartitionName=long Nodes=biomedia01,biomedia02,biomedia03,biomedia05 Default=YES MaxTime=43200
#PartitionName=short Nodes=biomedia01,biomedia03,biomedia05 Default=NO MaxTime=60 Priority=5000
PartitionName=gpus Nodes=monal01 Default=NO MaxTime=10080 MaxCPUsPerNode=4 MaxMemPerNode=30720
Copy link
Member

@jopasserat jopasserat May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the MaxCPUsPerNode=4 MaxMemPerNode=30720 settings. Just legacy code here

@jianlianggao
Copy link
Author

jianlianggao commented May 2, 2018 via email

StoragePass={{ pillar['slurm']['db']['password'] }}
StorageLoc=slurmdb
StorageUser=slurm
StoragePass=1BUy4eVv7X No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

password in cleartext in the commit history...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants