Job types#
Interactive jobs#
You already got to know the interactive web interface for Puhti
If you have a heavier job that still requires interactive response (e.g. testing, prototyping)
Allocate the resource via the the interactive partition
This way your work is performed in a compute node, not on the login node
Disadvantages of interactive jobs:
Blocks your shell until it finishes
Connection interruption means that job is gone.
Note: With persistent
compute node shell
from the web interface or using Linux tool screen it is possible to keep a job running while closing the terminal.
Apart from interactive jobs, a job can be classified as serial, parallel or GPU, depending on the main requested resource. A serial job is the simplest type of job whereas parallel and GPU jobs may require some advanced methods to fully utilise their capacity.
Serial jobs#
Serial job means that the computer works on only one task at a time following a sequence of instructions, while only using one core.
Why would your serial job benefit from being executed using CSC’s resources instead of on your own computer?
Part of a larger workflow
Avoid data transfer between CSC and your own computer
Data sharing among other project members
Readily configured environment / dependencies (e.g. R environment on Puhti)
Memory and/or disk demands
Parallel jobs#
A parallel job distributes the work over several cores in order to achieve a shorter wall time (and/or a larger allocatable memory).
In this course we will focus on embarrassingly/naturally/delightfully parallel processes with methods that are either built-in to the tools or tools that can start multiple jobs from one call. For more advanced usage, there are two major parallelization schemes: OpenMP and MPI.
Advanced topics - MPI/OpenMP
What is MPI?
MPI (Message Passing Interface) is a widely used standard for writing software that runs in parallel
MPI utilizes parallel processes that do not share memory
To exchange information, processes pass data messages back and forth between the cores
Communication can be a performance bottleneck
MPI is required when running on multiple nodes
What is OpenMP?
OpenMP (Open Multi-Processing) is a standard that utilizes compute cores that share memory, i.e. threads
They do not need to send messages between each other
OpenMP is easier for beginners, but problems quickly arise with so-called race conditions
This appears when different compute cores process and update the same data without proper synchronization
OpenMP is restricted to a single node
Batch job examples
Multicore OpenMP job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=X
Multicore MPI job
#SBATCH --nodes=X
#SBATCH --ntasks-per-node=Y
#SBATCH --cpus-per-task=1
--cpus-per-task
is typically used for OpenMP jobs--ntasks
is typically used for MPI jobsA task cannot be split between nodes, but tasks can be on different nodes
--ntasks-per-node
can be used for finer control
Self study materials for OpenMP and MPI
There are many tutorials available online
Look with simple searches for e.g. “MPI tutorial”
Check the documented exercise material and model answers from the CSC course “Introduction to Parallel Programming”
Available on GitHub
See also the materials of CSC Summer School in HPC
Array jobs#
Array jobs are one way of taking advantage of Puhti’s parallel processing capabilities for embarrassingly parallel tasks. Array jobs are useful when same code is executed many times for different datasets or with different parameters without the need to change your code. In GIS context, a typical use case would be to run some model on study area split into multiple files where output from one file doesn’t have an impact on the result of another area.
Maximum job limits
Submitting an array job of 100 members counts the same as 100 individual jobs from the batch queue system’s perspective. In Puhti, one can submit/run a maximum of 400/200 jobs at the same time (except for interactive
, test
and gputest
, where the limits are one or two). The number of submitted jobs per user per month should be kept below one thousand.
GPU jobs#
A GPU is capable of doing certain type of simultaneous calculations very efficiently. In order to take advantage of this power, a computer program must be programmed to adapt on how GPU handles data. For spatial computations on the GPU, check out for example RAPIDS cuSpatial. CSC’s GPU resources are relatively scarce and hence should be used with particular care. A GPU uses 60 times more billing units than a single CPU core. In practice, 1-10 CPU cores (but not more) should be allocated per GPU on Puhti.
Advanced topics - GPU
GPUs can speed up jobs
GPUs can be used for science, but are often challenging to program
Not all algorithms can use the full power of GPUs
Check the manual if the software can utilize GPUs, don’t use GPUs if you’re unsure
See our CSC Docs page on how to check if your batch job used GPU
The CSC usage policy limits GPU usage to where it is most efficient
Also, if you process lots of data, make sure you use the disk efficiently
Does your code run on AMD GPUs? LUMI has a massive GPU capacity!
Can your software utilize GPUs?
Think about your work
Which job type sounds like it could benefit your work?