Notes of installing slurm in Ubuntu @WSL
Jan 27th, 2019. Based on reference1
-
Install munge and slurm:
sudo apt install munge slurm-wlm
. And excuting the commandhostname
andslurmd -C
on each compute node will print its physical configuration (sockets, cores, real memeory size, etc.), which can be use in constructing the slurm.conf file. - Using a browser (firefox, opera, etc) to open /usr/share/doc/slurmctld/slurm-wlm-configurator.easy.html and generate the configuration file. Note: I am using just one node, so I used the host name for the ControlMachine, the NodeName and the ClusterName in the generating process of the configuration file. And the unit of RealMemory seems to be MB, so use 65536 for example if the node has 64GB. However, my queue got stuch in status Draining due to Low Real Memory at my first attemp, so I did not specify RealMemory on my second attempt. Finally,
sudo vi /etc/slurm-llnl/slurm.conf
and copy/paste the configuration file from the browser. My configuration file is:1 # slurm.conf file generated by configurator easy.html. 2 # Put this file on all nodes of your cluster. 3 # See the slurm.conf man page for more information. 4 # 5 ControlMachine=workstation #<YOUR-HOST-NAME> 6 #ControlAddr= 7 # 8 #MailProg=/bin/mail 9 MpiDefault=none 10 #MpiParams=ports=#-# 11 ProctrackType=proctrack/pgid 12 ReturnToService=1 13 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid 14 #SlurmctldPort=6817 15 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid 16 #SlurmdPort=6818 17 SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd 18 SlurmUser=slurm 19 #SlurmdUser=root 20 StateSaveLocation=/var/lib/slurm-llnl/slurmctld 21 SwitchType=switch/none 22 TaskPlugin=task/none 23 # 24 # 25 # TIMERS 26 #KillWait=30 27 #MinJobAge=300 28 #SlurmctldTimeout=120 29 #SlurmdTimeout=300 30 # 31 # 32 # SCHEDULING 33 FastSchedule=1 34 SchedulerType=sched/builtin 35 #SchedulerPort=7321 36 SelectType=select/linear 37 # 38 # 39 # LOGGING AND ACCOUNTING 40 AccountingStorageType=accounting_storage/none 41 #AccountingStoragePass=/var/run/munge/global.socket.2 42 ClusterName=workstation #<YOUR-HOST-NAME> 43 #JobAcctGatherFrequency=30 44 JobAcctGatherType=jobacct_gather/none 45 #SlurmctldDebug=3 46 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log 47 #SlurmdDebug=4 48 SlurmdLogFile=/var/log/slurm-llnl/slurmd.log 49 # 50 # 51 # COMPUTE NODES 52 NodeName=workstation CPUs=36 Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 State=UNKNOWN 53 PartitionName=long Nodes=workstation Default=YES MaxTime=INFINITE State=UP
- Enable and start the manager slurmctld:
sudo systemctl enable slurmctld
andsudo service slurmtcld start
. And enable and start the agent slurmd:sudo systemctl enable slurmd
andsudo service slurmd start
. - Start the munge service:
sudo /etc/init.d/munge start
. Notes: others common commandssudo /etc/init.d/munge status
,munge -n
, reference2. - Check the status slurm:
sinfo
,scontrol show node
. - Create a shell script and make it excutable:
vi submit.sh
#!/bin/bash sleep 30 env
chmod +x submit.sh
and submit the shell script:sbatch submit.sh
. Then check the status of the cluster and the queue:sinfo
andsqueue
. At last check the outpurt after 30scat slurm-<JOBID>.out
.