Batch job tutorial - Serial jobs

In this tutorial we’ll get familiar with the basic usage of the Slurm batch queue system at CSC

  • The goal is to learn how to request resources that match the needs of a job

πŸ’¬ A batch job consists of two parts: resource requests and the job step(s)

☝🏻 Examples are done on Puhti. If using the web interface, open a login node shell.

Serial jobs

πŸ’¬ A serial program can only use one core (CPU)

  • One should request only a single core from Slurm
  • The job does not benefit from additional cores
  • Excess cores are wasted since they will not be available to other users

πŸ’¬ Within the job (or allocation), the actual program is launched using the command srun

☝🏻 If you use a software that is pre-installed by CSC, please check its documentation page; it might have a batch job example with useful default settings.

Launching a serial job

  1. Go to the /scratch directory of your project:
cd /scratch/<project>      # replace <project> with your CSC project, e.g. project_2001234
  • Now your input (and output) will be on a shared disk that is accessible to the compute nodes.

πŸ’‘ You can list your projects with csc-projects

πŸ’‘ Note! If you’re using a project with other members (like the course project), first make a subdirectory for yourself (e.g. mkdir $USER and then move there (cd $USER) to not clutter the /scratch root of your project)

  1. Create a file called my_serial.bash e.g. with the nano text editor:
nano my_serial.bash
  1. Copy the following batch script there and change <project> to the CSC project you actually want to use:
#SBATCH --account=<project>      # Choose the billing project. Has to be defined!
#SBATCH --time=00:02:00          # Maximum duration of the job. Upper limit depends on the partition. 
#SBATCH --partition=test         # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun
#SBATCH --ntasks=1               # Number of tasks. Upper limit depends on partition. For a serial job this should be set 1!

srun hostname                    # Run hostname-command
srun sleep 60                    # Run sleep-command
  1. Submit the job to the batch queue and check its status with the commands:
sbatch my_serial.bash
squeue -u $USER

πŸ’¬ In the batch job example above we are requesting

  • one core (--ntasks=1)
  • for two minutes (--time=00:02:00)
  • from the test queue (--partition=test)

πŸ’¬ We want to run the program hostname that will print the name of the Puhti compute node that has been allocated for this particular job

πŸ’¬ In addition, we are running the sleep program to keep the job running for an additional 60 seconds, in order to have time to monitor the job

Checking the output and the efficiency

  • By default, the output is written to a file named slurm-<jobid>.out where <jobid> is a unique job ID assigned to the job by Slurm
  • Check the efficiency of the job compared to the reserved resources by issuing the command seff <jobid> (replace <jobid> with the actual Slurm job ID)

πŸ’­ You can get a list of all your jobs that are running or queuing with the command squeue -u $USER

πŸ—― A submitted job can be cancelled using the command scancel <jobid>

More information

πŸ’‘ FAQ on CSC batch jobs in Docs CSC