Slurm installation in Ubuntu@WSL

"Slurm, Linux"

Posted by QL on January 27, 2019

Notes of installing slurm in Ubuntu @WSL

Jan 27th, 2019. Based on reference1

  1. Install munge and slurm:sudo apt install munge slurm-wlm. And excuting the command hostname and slurmd -C on each compute node will print its physical configuration (sockets, cores, real memeory size, etc.), which can be use in constructing the slurm.conf file.

  2. Using a browser (firefox, opera, etc) to open /usr/share/doc/slurmctld/slurm-wlm-configurator.easy.html and generate the configuration file. Note: I am using just one node, so I used the host name for the ControlMachine, the NodeName and the ClusterName in the generating process of the configuration file. And the unit of RealMemory seems to be MB, so use 65536 for example if the node has 64GB. However, my queue got stuch in status Draining due to Low Real Memory at my first attemp, so I did not specify RealMemory on my second attempt. Finally, sudo vi /etc/slurm-llnl/slurm.conf and copy/paste the configuration file from the browser. My configuration file is:
      1 # slurm.conf file generated by configurator easy.html.
      2 # Put this file on all nodes of your cluster.
      3 # See the slurm.conf man page for more information.
      4 #
      5 ControlMachine=workstation #<YOUR-HOST-NAME>
      6 #ControlAddr=
      7 #
      8 #MailProg=/bin/mail
      9 MpiDefault=none
     10 #MpiParams=ports=#-#
     11 ProctrackType=proctrack/pgid
     12 ReturnToService=1
     13 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
     14 #SlurmctldPort=6817
     15 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
     16 #SlurmdPort=6818
     17 SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
     18 SlurmUser=slurm
     19 #SlurmdUser=root
     20 StateSaveLocation=/var/lib/slurm-llnl/slurmctld
     21 SwitchType=switch/none
     22 TaskPlugin=task/none
     23 #
     24 #
     25 # TIMERS
     26 #KillWait=30
     27 #MinJobAge=300
     28 #SlurmctldTimeout=120
     29 #SlurmdTimeout=300
     30 #
     31 #
     32 # SCHEDULING
     33 FastSchedule=1
     34 SchedulerType=sched/builtin
     35 #SchedulerPort=7321
     36 SelectType=select/linear
     37 #
     38 #
     39 # LOGGING AND ACCOUNTING
     40 AccountingStorageType=accounting_storage/none
     41 #AccountingStoragePass=/var/run/munge/global.socket.2
     42 ClusterName=workstation #<YOUR-HOST-NAME>
     43 #JobAcctGatherFrequency=30
     44 JobAcctGatherType=jobacct_gather/none
     45 #SlurmctldDebug=3
     46 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
     47 #SlurmdDebug=4
     48 SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
     49 #
     50 #
     51 # COMPUTE NODES
     52 NodeName=workstation CPUs=36 Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 State=UNKNOWN
     53 PartitionName=long Nodes=workstation Default=YES MaxTime=INFINITE State=UP
    
  3. Enable and start the manager slurmctld: sudo systemctl enable slurmctld and sudo service slurmtcld start. And enable and start the agent slurmd: sudo systemctl enable slurmd and sudo service slurmd start.
  4. Start the munge service: sudo /etc/init.d/munge start. Notes: others common commands sudo /etc/init.d/munge status, munge -n, reference2.
  5. Check the status slurm: sinfo, scontrol show node.
  6. Create a shell script and make it excutable: vi submit.sh
    #!/bin/bash
    sleep 30
    env
    
  7. chmod +x submit.sh and submit the shell script: sbatch submit.sh. Then check the status of the cluster and the queue: sinfo and squeue. At last check the outpurt after 30s cat slurm-<JOBID>.out.