Batch job tutorial - Serial jobs
In this tutorial weβll get familiar with the basic usage of the Slurm batch queue system at CSC
- The goal is to learn how to request resources that match the needs of a job
π¬ A batch job consists of two parts: resource requests and the job step(s)
βπ» Examples are done on Puhti. If using the web interface, open a login node shell.
Serial jobs
π¬ A serial program can only use one core (CPU)
- One should request only a single core from Slurm
- The job does not benefit from additional cores
- Excess cores are wasted since they will not be available to other users
π¬ Within the job (or allocation), the actual program is launched using the command srun
βπ» If you use a software that is pre-installed by CSC, please check its documentation page; it might have a batch job example with useful default settings.
Launching a serial job
-
Go to the
/scratchdirectory of your project:cd /scratch/<project> # replace <project> with your CSC project, e.g. project_2001234- Now your input (and output) will be on a shared disk that is accessible to the compute nodes.
π‘ You can list your projects with
csc-projectsπ‘ Note! If youβre using a project with other members (like the course project), first make a subdirectory for yourself (e.g.
mkdir $USERand then move there (cd $USER) to not clutter the/scratchroot of your project) -
Create a file called
my_serial.bashe.g. with thenanotext editor:nano my_serial.bash -
Copy the following batch script there and change
<project>to the CSC project you actually want to use:#!/bin/bash #SBATCH --account=<project> # Choose the billing project. Has to be defined! #SBATCH --time=00:02:00 # Maximum duration of the job. Upper limit depends on the partition. #SBATCH --partition=test # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun #SBATCH --ntasks=1 # Number of tasks. Upper limit depends on partition. For a serial job this should be set 1! srun hostname # Run hostname-command srun sleep 60 # Run sleep-command -
Submit the job to the batch queue and check its status with the commands:
sbatch my_serial.bash squeue -u $USER
π¬ In the batch job example above we are requesting
- one core (
--ntasks=1) - for two minutes (
--time=00:02:00) - from the test queue (
--partition=test)
π¬ We want to run the program hostname that will print the name of the Puhti compute node that has been allocated for this particular job
π¬ In addition, we are running the sleep program to keep the job running for an additional 60 seconds, in order to have time to monitor the job
Checking the output and the efficiency
- By default, the output is written to a file named
slurm-<jobid>.outwhere<jobid>is a unique job ID assigned to the job by Slurm - Check the efficiency of the job compared to the reserved resources by issuing the command
seff <jobid>(replace<jobid>with the actual Slurm job ID)
π You can get a list of all your jobs that are running or queuing with the command squeue -u $USER
π― A submitted job can be cancelled using the command scancel <jobid>
More information
π‘ FAQ on CSC batch jobs in Docs CSC