This tutorial requires that
This tutorial is done on Puhti
💬 HyperQueue is a tool for efficient sub-node task scheduling and well suited for farming and running embarrassingly parallel jobs.
💬 In this example, we have several similar molecular structures and would like to know how they differ energetically.
sbatch-hq
wrapper which allows easy execution of many commands
without needing to write a batch scriptsbatch-hq
command list to run the jobssbatch-hq
wrapper<project>
with your CSC
project, e.g. project_2001234
):mkdir -p /scratch/<project>/$USER/gaussian-hq
cd /scratch/<project>/$USER/gaussian-hq
wget https://a3s.fi/CSC_training/C7O2H10.tar.gz
tar -xzf C7O2H10.tar.gz
.mol
format:cd C7O2H10
💬 Gaussian is a program for molecular electronic structure calculations.
module load openbabel
obabel *.mol -ocom -m
.com
format that is used by Gaussian.💬 In this example we want to do a b3lyp/cc-pVDZ
calculation on these structures, i.e.
a hybrid density functional theory calculation using the B3LYP exchange-correlation
functional and the cc-PVDZ basis set.
b3lyp/cc-pVDZ
keyword at the beginning of each .com
file:sed -i '1s/^/#b3lyp\/cc-pVDZ \n/' *.com
%NProcShared=4
to each input file:sed -i '1s/^/%NProcShared=4\n/' *.com
💬 A task array can sometimes be lengthy so rather than typing it by hand it is more feasible to use bash scripting to create a suitable task list file for HyperQueue.
cd ..
commandlist
:for f in ${PWD}/C7O2H10/*.com; do echo "g16 < $f >> output/$(basename ${f%.*}).log" >> commandlist; done
more
, less
or cat
. The file should look like:g16 < /scratch/<project>/$USER/gaussian-hq/C7O2H10/dsC7O2H10nsd_0001.com >> output/dsC7O2H10nsd_0001.log
g16 < /scratch/<project>/$USER/gaussian-hq/C7O2H10/dsC7O2H10nsd_0002.com >> output/dsC7O2H10nsd_0002.log
g16 < /scratch/<project>/$USER/gaussian-hq/C7O2H10/dsC7O2H10nsd_0003.com >> output/dsC7O2H10nsd_0003.log
...
output
. Create this directory:mkdir -p output
sbatch-hq
💬 Running a HyperQueue task array is similar to running a Slurm array job. However,
HyperQueue packs the individual tasks within a single Slurm job step and is thus much
more efficient, especially if there are a huge number of tasks. In this case, submitting
the job is also very easy since we can use the sbatch-hq
wrapper to avoid having to
create a batch script by hand.
sbatch-hq
:module load sbatch-hq gaussian
sbatch-hq --cores=4 --nodes=1 --account=<project> --partition=small --time=00:15:00 commandlist
💬 The sbatch-hq
command creates and submits a batch script that starts the HyperQueue
server and worker(s) and submits the task array with inputs read from the commandlist
file. The following resources are requested:
--nodes=1
, i.e. 40 cores in total--cores=4
, matching the specification in each Gaussian input file--time=00:15:00
--account <project>
(replace <project>
accordingly)small
partition💬 Given that 40 cores are requested for running 200 tasks, each using 4 cores, 10 tasks are able to run concurrently. The number of commands in the file can (usually should) be much larger than the number of commands that can fit running simultaneously on the reserved resources to avoid creating too short Slurm jobs.
<slurmjobid>
with the assigned Slurm job ID):squeue -j <slurmjobid>
# or
squeue --me
# or
squeue -u $USER
hq
commands:export HQ_SERVER_DIR=$PWD/hq-server-<slurmjobid> # replace <slurmjobid> with the actual id of your Slurm job
hq job info 1
b3lyp/cc-pVDZ
energies for each of the 200 structures sorted by energy
(most stable structure first):grep -r "E(RB3LYP)" output | sort -k6 -n -o energies.txt
head energies.txt
, the output should look like:output/dsC7O2H10nsd_0015.log: SCF Done: E(RB3LYP) = -423.218630672 A.U. after 14 cycles
output/dsC7O2H10nsd_0192.log: SCF Done: E(RB3LYP) = -423.216601925 A.U. after 12 cycles
output/dsC7O2H10nsd_0193.log: SCF Done: E(RB3LYP) = -423.214963908 A.U. after 12 cycles
output/dsC7O2H10nsd_0028.log: SCF Done: E(RB3LYP) = -423.214781165 A.U. after 13 cycles
output/dsC7O2H10nsd_0037.log: SCF Done: E(RB3LYP) = -423.214421420 A.U. after 14 cycles
output/dsC7O2H10nsd_0026.log: SCF Done: E(RB3LYP) = -423.214326717 A.U. after 14 cycles
output/dsC7O2H10nsd_0008.log: SCF Done: E(RB3LYP) = -423.213824577 A.U. after 14 cycles
output/dsC7O2H10nsd_0036.log: SCF Done: E(RB3LYP) = -423.212123483 A.U. after 14 cycles
output/dsC7O2H10nsd_0025.log: SCF Done: E(RB3LYP) = -423.212093937 A.U. after 14 cycles
output/dsC7O2H10nsd_0191.log: SCF Done: E(RB3LYP) = -423.211777369 A.U. after 13 cycles