Exercise - basics#

Timing

45 min

Goals

Get more familiar with command line
Get to know sbatch script
Get to know job submission
Interactive -> non interactive

Prerequisites

Access to Puhti webinterface
Own directory within the course directory /scratch/project_20xxxxx/students/cscusername

Batch job tutorial - Interactive jobs#

These examples are done on Puhti. When using the web interface, you can open a compute node shell directly.

In an interactive batch job, an interactive shell session is launched on a compute node, for which one can request specific resources (time, memory, cores, disk).

Launching an interactive job / compute node shell#

Observe how you need to now define the resources you want to reserve. Let’s reserve 10 minutes.

Other ways of starting an interactive session

On the login node: Start an interactive job with srun, e.g.:

srun  --time=00:10:00 --pty --account=project_20xxxxx --partition=interactive  bash      ##replace xxxxx with your project number; you can also add --reservation=geocomputing_wed here for the course (not available at other times), change partition to small then

or on Puhti you can also use the sinteractive wrapper to start an interactive session from the login node, which simplifies the call and asks you for the resources step by step:

sinteractive -i

or directly:

sinteractive --account project_20xxxxx --time 00:10:00         # replace xxxxx with your CSC project, e.g. project_2001234

Need your project number?

You can check my.csc.fi or list your projects with csc-projects in a login node shell.

Observe how the command prompt (initial text on each row on the command-line) looks now compared to a login node shell e.g. r07c51, which refers to a compute node, as opposed to e.g. puhti-login11.

Once on the compute node, you can run commands directly from the command-line. You can e.g. load the geoconda module:

module load geoconda

Then we can use for example gdalinfo to check the details of some rasterfile.

gdalinfo /appl/data/geo/luke/forest_wind_damage_sensitivity/2017/windmap2017_int1k_metsamaa2_cog.tif

Task

Try out some other command line tool, or maybe even start a python or R session. What modules do you need to load? Check the CSC Docs pages about “geo” applications.

Quit the interactive batch job with exit.

-> This way you can work interactively for an extended period, using e.g. lots of memory without creating load on the login nodes.

Note that above we only asked for 10 minutes of time. Once that is up, you will be automatically logged out of the compute node.

Running exit on the login node will log you out from Puhti.

More information on interactive jobs

Documentation at Docs CSC: Interactive usage and CSC Docs: FAQ on CSC batch jobs

Batch job tutorial - Serial jobs#

Examples are done on Puhti. In the Puhti web interface, open a login node shell.

Remember

A serial program can only use one core (CPU)

One should request only a single core from SLURM
The job does not benefit from additional cores
Excess cores are wasted since they will not be available to other users

If you use a software that is pre-installed by CSC, please check its documentation page; it might have a batch job example with useful default settings.

Launching a serial job#

Go to your own directory in the /scratch directory of your project:

cd /scratch/project_20xxxxx/students/cscusername      # replace xxxxx with your CSC project number and cscusername with your username

Create a file called my_serial.bash e.g. with the nano text editor:

nano my_serial.bash

Copy the following batch script there and change xxxx to the CSC project you actually want to use:

#!/bin/bash
#SBATCH --account=project_20xxxxx   # Choose the billing project. Has to be defined!
#SBATCH --time=00:02:00             # Maximum duration of the job. Upper limit depends on the partition. 
#SBATCH --partition=test            # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun
#SBATCH --ntasks=1                  # Number of tasks. Upper limit depends on partition. For a serial job this should be set to 1!

echo -n "We are running on"
hostname                    # Run hostname-command, that will print the name of the Puhti compute node that has been allocated for this particular job
sleep 60                    # Run sleep-command, to keep the job running for an additional 60 seconds, in order to have time to monitor the job
echo "This job has finished"

In the batch job example above we are requesting

one core (--ntasks=1)
for two minutes (--time=00:02:00)
from the test queue (--partition=test)

Submit the job to the batch queue and check its status with the commands:

sbatch my_serial.bash
squeue --me

Once the job is completed, check how much of the resources have been used with seff jobid (replace jobid with the number that was displayed after you ran the sbatch command).

Additional exercises

Where can you find the output of the hostname command?
How could you add a name to the job for easier identification?
What happens if you run the same script from above, but we request only one minute, and sleep for 2 minutes?
Can you run the gdalinfo command from the interactive job above via a non-interactive job? What do you need to change in the sbatch job script?

Solution

slurm-jobid.out in the directory from where you submitted the batch job. You can also change that location by specifying it in your batch job script with #SBATCH --output=/your/path/slurm-%j.out.
Add #SBATCH --job-name=myname to the resource request at the top of your sbatch script to rename the job to “myname”.
After the job finished, check the log file with cat slurm-<jobid>.out. You should see an an error in the end slurmstepd: error: *** JOB xxx ON xxx CANCELLED AT xDATE-TIMEx DUE TO TIME LIMIT ***. This means that our job was killed for exceeding the amount of resources requested. Although this appears harsh, this is actually a feature. Strict adherence to resource requests allows the scheduler to find the best possible place for your jobs. It also ensures the fair share of use of the computing resources.
Since gdalinfo is quite a fast command to run, you will only need to change the script part of your sbatch script, the resources request can stay the same. First we will need to make gdal available within the job with module load geoconda, then we can run the gdalinfo command. After the job is done, you can find the information again in the slurm-jobid.out file.

#!/bin/bash
#SBATCH --account=<project>      # Choose the billing project. Has to be defined!
#SBATCH --time=00:02:00          # Maximum duration of the job. Upper limit depends on the partition. 
#SBATCH --partition=test         # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun
#SBATCH --ntasks=1               # Number of tasks. Upper limit depends on partition. For a serial job this should be set to 1!

module load geoconda

gdalinfo /appl/data/geo/luke/forest_wind_damage_sensitivity/2017/windmap2017_int1k_metsamaa2_cog.tif

Key points

A batch job script combines resource estimates and computation steps
- Resource request lines start with #SBATCH
You can find the job’s output and errors in slurm-jobid.out

Exercise - basics

Contents

Exercise - basics#

Batch job tutorial - Interactive jobs#

Launching an interactive job / compute node shell#

Batch job tutorial - Serial jobs#

Launching a serial job#