Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

Salt formula provisioning a Slurm cluster

Availables states:
* Munge
* Screen
* Slurm
* Slurm Database
* SSH
To install Slurm nodes, you need to copy (on Slurm mater node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move that lower in an "Instructions" subsection instead of replacing what the formula contains?


- munge.key from /etc/munge/munge.key to /srv/salt/munge.key
- slurm.cert from /etc/slurm-llnl/slurm.cert to /srv/salt/slurm.cert
- slurm.conf from files/etc/slurm-llnl/slurm.conf to /srv/salt/slurm.conf
- create empty cgroup.conf and gres.conf in /srv/salt/
5 changes: 0 additions & 5 deletions files/etc/apt/preferences.d/disable-utopic-policy

This file was deleted.

5 changes: 0 additions & 5 deletions files/etc/apt/preferences.d/slurm-utopic-policy

This file was deleted.

1 change: 0 additions & 1 deletion files/etc/apt/sources.list.d/utopic.list

This file was deleted.

1 change: 1 addition & 0 deletions files/etc/apt/sources.list.d/xenial.list
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
deb http://archive.ubuntu.com/ubuntu xenial main restricted universe multiverse
6 changes: 0 additions & 6 deletions files/etc/default/munge

This file was deleted.

2 changes: 1 addition & 1 deletion files/etc/security/access.conf
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@
# who wants to be able to SSH in as root via public-key on Biomedia servers.
# disable SSH for anybody but root
+:root:ALL
-:ALL EXCEPT (csg) dr jpassera bglocker:ALL
-:ALL EXCEPT (csg) (biomedia) dr jpassera bglocker jgao:ALL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regular users shouldn't have SSH access to the cluster nodes, hence the previous config

2 changes: 0 additions & 2 deletions files/etc/slurm-llnl/bardolph/gres.conf

This file was deleted.

9 changes: 5 additions & 4 deletions files/etc/slurm-llnl/cgroup.conf
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ CgroupReleaseAgentDir=/var/spool/slurm-llnl/cgroup
ConstrainCores=yes
TaskAffinity=yes
#ConstrainRAMSpace=no
### not used yet
#ConstrainDevices=no
#AllowedDevicesFile=/etc/slurm-llnl/cgroup_allowed_devices_file.conf

ConstrainSwapSpace=yes
AllowedSwapSpace=10.0
# Not well supported until Slurm v14.11.4 https://groups.google.com/d/msg/slurm-devel/oKAUed7AETs/Eb6thh9Lc0YJ
#ConstrainDevices=yes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be enabled and not commented out then?

#AllowedDevicesFile=/etc/slurm-llnl/cgroup_allowed_devices_file.conf
8 changes: 0 additions & 8 deletions files/etc/slurm-llnl/monal01/gres.conf

This file was deleted.

6 changes: 0 additions & 6 deletions files/etc/slurm-llnl/monal02/gres.conf

This file was deleted.

29 changes: 15 additions & 14 deletions files/etc/slurm-llnl/slurm.conf
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ JobCredentialPublicCertificate=/etc/slurm-llnl/slurm.cert
#Licenses=foo*4,bar
MailProg=/usr/bin/mail
MaxJobCount=25000
MaxArraySize=32000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=none
Expand Down Expand Up @@ -119,9 +120,9 @@ PreemptMode=OFF
#
# LOGGING AND ACCOUNTING
DefaultStorageType=slurmdbd
DefaultStorageUser={{ pillar['slurm']['db']['user'] }}
DefaultStorageUser=slurm
#DefaultStorageLoc=/var/log/slurm-llnl/job_completions.log
DefaultStorageHost={{ pillar['slurm']['controller'] }}
DefaultStorageHost=biomedia03
DefaultStoragePort=6819
AccountingStorageEnforce=associations,limits
#AccountingStorageHost=
Expand All @@ -132,11 +133,11 @@ AccountingStorageEnforce=associations,limits
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=biomediacluster
#DebugFlags=
#JobCompHost={{ pillar['slurm']['controller'] }}
DebugFlags=Gres
#JobCompHost=biomedia03
#JobCompLoc=
#JobCompUser={{ pillar['slurm']['db']['user'] }}
#JobCompPass={{ pillar['slurm']['db']['password'] }}
#JobCompUser=slurm
#JobCompPass=1BUy4eVv7X
#JobCompPort=
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
Expand Down Expand Up @@ -164,6 +165,7 @@ SlurmSchedLogFile=/var/log/slurm-llnl/sched.log
#
#
# GRes configuration
# GresTypes=gpu
GresTypes={{ ','.join(pillar['slurm']['gres']) }}
# COMPUTE NODES
{% for node, values in pillar['slurm']['nodes']['batch']['cpus'].items() %}
Expand All @@ -177,13 +179,12 @@ NodeName={{ node }} RealMemory={{ values.mem }} CPUs={{ values.cores }} Gres={{
{% endfor %}
# Partitions
PartitionName=long Nodes={{ ','.join(pillar['slurm']['nodes']['batch']['cpus']) }} Default=YES MaxTime=43200
PartitionName=short Nodes={{ ','.join(pillar['slurm']['nodes']['batch']['cpus']) }} Default=NO MaxTime=60 Priority=5000
PartitionName=gpus Nodes={{ ','.join(pillar['slurm']['nodes']['batch']['gpus']) }} Default=NO MaxTime=10080
PartitionName=interactive Nodes={{ ','.join(pillar['slurm']['nodes']['interactive']['cpus']) }} Default=NO MaxTime=4320 Priority=7000 PreemptMode=OFF
#PartitionName=short Nodes={{ ','.join(pillar['slurm']['nodes']['batch']['cpus']) }} Default=NO MaxTime=60 Priority=5000
PartitionName=gpus Nodes={{ ','.join(pillar['slurm']['nodes']['batch']['gpus']) }} Default=NO MaxTime=10080 MaxCPUsPerNode=4 MaxMemPerNode=30720
#PartitionName=interactive Nodes={{ ','.join(pillar['slurm']['nodes']['interactive']['cpus']) }} Default=NO MaxTime=4320 Priority=7000 PreemptMode=OFF

{% set rocsList = [] %}
{% for node, values in pillar['slurm']['nodes']['batch']['cpus'].items() %} {% if node.startswith('roc') %} {% set rocsListTrash = rocsList.append(node) %} {% endif %} {% endfor %}

PartitionName=rocsLong Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=43200
PartitionName=rocsShort Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=60 Priority=5000
#{% set rocsList = [] %}
#{% for node, values in pillar['slurm']['nodes']['batch']['cpus'].items() %} {% if node.startswith('roc') %} {% set rocsListTrash = rocsList.append(node) %} {% endif %} {% endfor %}

#PartitionName=rocsLong Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=43200
#PartitionName=rocsShort Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=60 Priority=5000
10 changes: 5 additions & 5 deletions files/etc/slurm-llnl/slurmdbd.conf
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ArchiveSuspend=no
#ArchiveScript=/usr/sbin/slurm.dbd.archive
#AuthInfo=/var/run/munge/munge.socket.2
AuthType=auth/munge
DbdHost={{ pillar['slurm']['controller'] }}
DbdHost=biomedia03
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no hardcoded values please => use Pillar

DbdPort=6819
DebugLevel=info
PurgeEventAfter=1month
Expand All @@ -16,10 +16,10 @@ PurgeSuspendAfter=1month
LogFile=/var/log/slurm-llnl/slurmdbd.log
PidFile=/var/run/slurm-llnl/slurmdbd.pid
SlurmUser=slurm
#StorageHost={{ pillar['slurm']['controller'] }}
#StorageHost=biomedia03
StorageHost=localhost
StorageType=accounting_storage/mysql
StoragePort=3306
StorageLoc={{ pillar['slurm']['db']['name'] }}
StorageUser={{ pillar['slurm']['db']['user'] }}
StoragePass={{ pillar['slurm']['db']['password'] }}
StorageLoc=slurmdb
StorageUser=slurm
StoragePass=1BUy4eVv7X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

password in cleartext in the commit history...

3 changes: 3 additions & 0 deletions get_slurm_ver.sls
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
get slurmd ver:
cmd.run:
- name: dpkg -s slurmd |grep "^Version:" > /tmp/local_slurm_ver.txt
Loading