Psst, remember the cheatsheet!

Job types#

Interactive jobs#

You already got to know the interactive web interface for Puhti

  • If you have a heavier job that still requires interactive response (e.g. testing, prototyping)

    • Allocate the resource via the the interactive partition

    • This way your work is performed in a compute node, not on the login node

Disadvantages of interactive jobs:

  • Blocks your shell until it finishes

  • Connection interruption means that job is gone.

    • Note: With persistent compute node shell from the web interface or using Linux tool screen it is possible to keep a job running while closing the terminal.

Apart from interactive jobs, a job can be classified as serial, parallel or GPU, depending on the main requested resource. A serial job is the simplest type of job whereas parallel and GPU jobs may require some advanced methods to fully utilise their capacity.

Serial jobs#

Serial job means that the computer works on only one task at a time following a sequence of instructions, while only using one core.

Why would your serial job benefit from being executed using CSC’s resources instead of on your own computer?

  • Part of a larger workflow

  • Avoid data transfer between CSC and your own computer

  • Data sharing among other project members

  • Readily configured environment / dependencies (e.g. R environment on Puhti)

  • Memory and/or disk demands

Parallel jobs#

A parallel job distributes the work over several cores in order to achieve a shorter wall time (and/or a larger allocatable memory).

In this course we will focus on embarrassingly/naturally/delightfully parallel processes with methods that are either built-in to the tools or tools that can start multiple jobs from one call. For more advanced usage, there are two major parallelization schemes: OpenMP and MPI.

Array jobs#

Array jobs are one way of taking advantage of Puhti’s parallel processing capabilities for embarrassingly parallel tasks. Array jobs are useful when same code is executed many times for different datasets or with different parameters without the need to change your code. In GIS context, a typical use case would be to run some model on study area split into multiple files where output from one file doesn’t have an impact on the result of another area.

Maximum job limits

Submitting an array job of 100 members counts the same as 100 individual jobs from the batch queue system’s perspective. In Puhti, one can submit/run a maximum of 400/200 jobs at the same time (except for interactive, test and gputest, where the limits are one or two). The number of submitted jobs per user per month should be kept below one thousand.

GPU jobs#

A GPU is capable of doing certain type of simultaneous calculations very efficiently. In order to take advantage of this power, a computer program must be programmed to adapt on how GPU handles data. For spatial computations on the GPU, check out for example RAPIDS cuSpatial. CSC’s GPU resources are relatively scarce and hence should be used with particular care. A GPU uses 60 times more billing units than a single CPU core. In practice, 1-10 CPU cores (but not more) should be allocated per GPU on Puhti.

Think about your work

Which job type sounds like it could benefit your work?