In this tutorial weβll get familiar with the basic usage of the Slurm batch queue system at CSC
- The goal is to learn how to request resources that match the needs of a job
π¬ A batch job consists of two parts: resource requests and the job step(s)
βπ» Examples are done on Puhti. If using the web interface, open a login node shell.
π¬ A parallel program is capable of utilizing several cores and other resources simultaneously for the same job
π¬ The aim of a parallel program is to solve a problem (job) faster and to tackle larger problems that would be intractable to run on a single core
π‘ There are two major approaches to dividing a computational burden over several cores:
βπ» Note! You need to have an MPI module loaded when running parallel batch jobs. If you get an error saying error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory
, try module load StdEnv
to load the default environment (or load a specific MPI module, e.g. openmpi
).
π¬ An OpenMP-enabled program can take advantage of multiple cores that share the same memory on a single node, a.k.a. threads
/scratch
directory of your project:cd /scratch/<project>/$USER # replace <project> with your CSC project, e.g. project_2001234
π‘ You can list your projects with csc-projects
wget https://a3s.fi/hello_omp.x/hello_omp.x
chmod +x hello_omp.x
my_parallel_omp.bash
and change <project>
to the CSC project you actually want to use:#!/bin/bash
#SBATCH --account=<project> # Choose the billing project. Has to be defined!
#SBATCH --time=00:00:10 # Maximum duration of the job. Upper limit depends on partition.
#SBATCH --partition=test # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun
#SBATCH --ntasks=1 # Number of tasks. Upper limit depends on partition.
#SBATCH --cpus-per-task=4 # How many processors work on one task. Max: Number of CPUs per node.
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun hello_omp.x
sbatch my_parallel_omp.bash
π¬ In the batch job example above we are requesting
--ntasks=1
)--cpus-per-task=4
)--time=00:00:10
)--partition=test
)π¬ We want to run the program hello_omp.x
that will be able to utilize four cores
π Exporting the environment variable OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
will tell the program that it can use four threads
π― Each of the four threads launched by hello_omp.x
will print their own output
π¬ When finished, the output file slurm-<jobid>.out
should contain the results printed from each of the four OpenMP threads
ls
cat slurm-<jobid>.out # replace <jobid> with the actual Slurm job ID
cat slurm-5118404.out
Hello from thread: 0
Hello from thread: 3
Hello from thread: 2
Hello from thread: 1
π¬ An MPI-enabled program can take advantage of resources that are spread over multiple compute nodes
wget https://a3s.fi/hello_mpi.x/hello_mpi.x
chmod +x hello_mpi.x
my_parallel.bash
and change <project>
to the CSC project you actually want to use:#!/bin/bash
#SBATCH --account=<project> # Choose the billing project. Has to be defined!
#SBATCH --time=00:00:10 # Maximum duration of the job. Upper limit depends of the partition.
#SBATCH --partition=test # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun
#SBATCH --nodes=2 # Number of compute nodes. Upper limit depends on partition.
#SBATCH --ntasks-per-node=4 # How many tasks to launch per node. Depends on the number of cores and memory on a node.
srun hello_mpi.x
sbatch my_parallel.bash
π¬ In the batch job example above we are requesting
--nodes=2
)--ntasks-per-node=4
)--time=00:00:10
)--partition=test
)π¬ We want to run the program hello_mpi.x
that will, based on the resource request, start 8 simultaneous tasks
π¬ Each of the 8 tasks launched by hello_mpi.x
will report their number and on which node they ran
π¬ When finished, the output file slurm-<jobid>.out
will contain the results from the hello_mpi.x
program on how the 8 tasks were distributed over the two reserved nodes
cat slurm-<jobid>.out # replace <jobid> with the actual Slurm job ID
Hello world from node r07c01.bullx, rank 0 out of 8 tasks
Hello world from node r07c02.bullx, rank 5 out of 8 tasks
Hello world from node r07c02.bullx, rank 7 out of 8 tasks
Hello world from node r07c01.bullx, rank 2 out of 8 tasks
Hello world from node r07c02.bullx, rank 4 out of 8 tasks
Hello world from node r07c01.bullx, rank 3 out of 8 tasks
Hello world from node r07c01.bullx, rank 1 out of 8 tasks
Hello world from node r07c02.bullx, rank 6 out of 8 tasks
r07c01.bullx, r07c02.bullx
), four tasks on eachseff <jobid>
(replace <jobid>
with the actual Slurm job ID)π― Note! This example asks 4 cores from each of the 2 nodes. Normally, this would not make sense, and instead it would be better to run all 8 cores in the same node (in Puhti one node has 40 cores!). Typically, you want your resources (cores) to be spread on as few nodes as possible to avoid unnecessary communication between nodes.
π‘ FAQ on CSC batch jobs in Docs CSC
π You can get a list of all your jobs that are running or queuing with the command squeue -u $USER
π A submitted job can be cancelled using the command scancel <jobid>