In this tutorial weβll get familiar with the basic usage of the Slurm batch queue system at CSC
- The goal is to learn how to request resources that match the needs of a job
π¬ A batch job consists of two parts: resource requests and the job step(s)
βπ» Examples are done on Puhti. If using the web interface, open a login node shell.
π¬ A serial program can only use one core (CPU)
π¬ Within the job (or allocation), the actual program is launched using the command srun
βπ» If you use a software that is pre-installed by CSC, please check its documentation page; it might have a batch job example with useful default settings.
/scratch
directory of your project:cd /scratch/<project> # replace <project> with your CSC project, e.g. project_2001234
π‘ You can list your projects with csc-projects
π‘ Note! If youβre using a project with other members (like the course project), first make a subdirectory for yourself (e.g. mkdir $USER
and then move there (cd $USER
) to not clutter the /scratch
root of your project)
my_serial.bash
e.g. with the nano
text editor:nano my_serial.bash
<project>
to the CSC project you actually want to use:#!/bin/bash
#SBATCH --account=<project> # Choose the billing project. Has to be defined!
#SBATCH --time=00:02:00 # Maximum duration of the job. Upper limit depends on the partition.
#SBATCH --partition=test # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun
#SBATCH --ntasks=1 # Number of tasks. Upper limit depends on partition. For a serial job this should be set 1!
srun hostname # Run hostname-command
srun sleep 60 # Run sleep-command
sbatch my_serial.bash
squeue -u $USER
π¬ In the batch job example above we are requesting
--ntasks=1
)--time=00:02:00
)--partition=test
)π¬ We want to run the program hostname
that will print the name of the Puhti compute node that has been allocated for this particular job
π¬ In addition, we are running the sleep
program to keep the job running for an additional 60 seconds, in order to have time to monitor the job
slurm-<jobid>.out
where <jobid>
is a unique job ID assigned to the job by Slurmseff <jobid>
(replace <jobid>
with the actual Slurm job ID)π You can get a list of all your jobs that are running or queuing with the command squeue -u $USER
π― A submitted job can be cancelled using the command scancel <jobid>
π‘ FAQ on CSC batch jobs in Docs CSC