Exercise - basics#
Timing
45 min
Goals
Get more familiar with command line
Get to know sbatch script
Get to know job submission
Interactive -> non interactive
Prerequisites
Access to Puhti webinterface
Own directory within the course directory
/scratch/project_20xxxxx/students/cscusername
Batch job tutorial - Interactive jobs#
These examples are done on Puhti. When using the web interface, you can open a compute node shell directly.
In an interactive batch job, an interactive shell session is launched on a compute node, for which one can request specific resources (time, memory, cores, disk).
Launching an interactive job / compute node shell#
Observe how you need to now define the resources you want to reserve. Let’s reserve 10 minutes.
Other ways of starting an interactive session
On the login node: Start an interactive job with srun
, e.g.:
srun --time=00:10:00 --pty --account=project_20xxxxx --partition=interactive bash ##replace xxxxx with your project number; you can also add --reservation=geocomputing_wed here for the course (not available at other times), change partition to small then
or on Puhti you can also use the sinteractive
wrapper to start an interactive session from the login node, which simplifies the call and asks you for the resources step by step:
sinteractive -i
or directly:
sinteractive --account project_20xxxxx --time 00:10:00 # replace xxxxx with your CSC project, e.g. project_2001234
Need your project number?
You can check my.csc.fi or list your projects with csc-projects
in a login node shell.
Observe how the command prompt (initial text on each row on the command-line) looks now compared to a login node shell e.g. r07c51
, which refers to a compute node, as opposed to e.g. puhti-login11
.
Once on the compute node, you can run commands directly from the command-line. You can e.g. load the
geoconda
module:
module load geoconda
Then we can use for example
gdalinfo
to check the details of some rasterfile.
gdalinfo /appl/data/geo/luke/forest_wind_damage_sensitivity/2017/windmap2017_int1k_metsamaa2_cog.tif
Task
Try out some other command line tool, or maybe even start a python
or R
session. What modules do you need to load? Check the CSC Docs pages about “geo” applications.
Quit the interactive batch job with
exit
.
-> This way you can work interactively for an extended period, using e.g. lots of memory without creating load on the login nodes.
Note that above we only asked for 10 minutes of time. Once that is up, you will be automatically logged out of the compute node.
Running exit
on the login node will log you out from Puhti.
More information on interactive jobs
Documentation at Docs CSC: Interactive usage and CSC Docs: FAQ on CSC batch jobs
Batch job tutorial - Serial jobs#
Examples are done on Puhti. In the Puhti web interface, open a login node shell.
Remember
A serial program can only use one core (CPU)
One should request only a single core from SLURM
The job does not benefit from additional cores
Excess cores are wasted since they will not be available to other users
If you use a software that is pre-installed by CSC, please check its documentation page; it might have a batch job example with useful default settings.
Launching a serial job#
Go to your own directory in the
/scratch
directory of your project:
cd /scratch/project_20xxxxx/students/cscusername # replace xxxxx with your CSC project number and cscusername with your username
Create a file called
my_serial.bash
e.g. with thenano
text editor:
nano my_serial.bash
Copy the following batch script there and change
xxxx
to the CSC project you actually want to use:
#!/bin/bash
#SBATCH --account=project_20xxxxx # Choose the billing project. Has to be defined!
#SBATCH --time=00:02:00 # Maximum duration of the job. Upper limit depends on the partition.
#SBATCH --partition=test # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun
#SBATCH --ntasks=1 # Number of tasks. Upper limit depends on partition. For a serial job this should be set to 1!
echo -n "We are running on"
hostname # Run hostname-command, that will print the name of the Puhti compute node that has been allocated for this particular job
sleep 60 # Run sleep-command, to keep the job running for an additional 60 seconds, in order to have time to monitor the job
echo "This job has finished"
In the batch job example above we are requesting
one core (
--ntasks=1
)for two minutes (
--time=00:02:00
)from the test queue (
--partition=test
)
Submit the job to the batch queue and check its status with the commands:
sbatch my_serial.bash
squeue --me
Once the job is completed, check how much of the resources have been used with
seff jobid
(replace jobid with the number that was displayed after you ran thesbatch
command).
Additional exercises
Where can you find the output of the
hostname
command?How could you add a name to the job for easier identification?
What happens if you run the same script from above, but we request only one minute, and sleep for 2 minutes?
Can you run the
gdalinfo
command from the interactive job above via a non-interactive job? What do you need to change in the sbatch job script?
Solution
slurm-jobid.out
in the directory from where you submitted the batch job. You can also change that location by specifying it in your batch job script with#SBATCH --output=/your/path/slurm-%j.out
.Add
#SBATCH --job-name=myname
to the resource request at the top of your sbatch script to rename the job to “myname”.After the job finished, check the log file with
cat slurm-<jobid>.out
. You should see an an error in the endslurmstepd: error: *** JOB xxx ON xxx CANCELLED AT xDATE-TIMEx DUE TO TIME LIMIT ***
. This means that our job was killed for exceeding the amount of resources requested. Although this appears harsh, this is actually a feature. Strict adherence to resource requests allows the scheduler to find the best possible place for your jobs. It also ensures the fair share of use of the computing resources.Since gdalinfo is quite a fast command to run, you will only need to change the script part of your sbatch script, the resources request can stay the same. First we will need to make
gdal
available within the job withmodule load geoconda
, then we can run thegdalinfo
command. After the job is done, you can find the information again in theslurm-jobid.out
file.
#!/bin/bash
#SBATCH --account=<project> # Choose the billing project. Has to be defined!
#SBATCH --time=00:02:00 # Maximum duration of the job. Upper limit depends on the partition.
#SBATCH --partition=test # Job queues: test, interactive, small, large, longrun, hugemem, hugemem_longrun
#SBATCH --ntasks=1 # Number of tasks. Upper limit depends on partition. For a serial job this should be set to 1!
module load geoconda
gdalinfo /appl/data/geo/luke/forest_wind_damage_sensitivity/2017/windmap2017_int1k_metsamaa2_cog.tif
Key points
A batch job script combines resource estimates and computation steps
Resource request lines start with
#SBATCH
You can find the job’s output and errors in
slurm-jobid.out