Using Allas in batch jobs
Preparations
💬 The allas-conf
command opens an Allas connection that is valid for eight hours.
- In case of interactive usage, this eight-hour limit is not problematic as
allas-conf
can be executed again to extend the validity of the connection. - In case of batch jobs, the situation is different, as it may take more than eight hours before the job even starts.
- To be able to use Allas in a batch job, run
allas-conf
again with the-k
option:
allas-conf -k
- Here, the option
-k
indicates that the password will be stored in an environment variable$OS_PASSWORD
.- With this variable defined, you no longer need to input your password when you re-execute
allas-conf
with the-k
option and the Allas project name.
- With this variable defined, you no longer need to input your password when you re-execute
☝🏻 Note that if you mistype your password when using the -k
option, you must use the command unset OS_PASSWORD
before you can try again.
- Refresh the connection with the command:
allas-conf -k <project> # replace <project> with your CSC project, e.g. project_2001234
☝🏻 When $OS_PASSWORD
is set, the a-commands
(a-put
, a-get
, a-list
, a-delete
) automatically refresh the Allas connection when the commands are executed in a batch job.
- Choose a file from Allas. The file should have text in it.
a-list <project number>_$USER # replace <project number> with your CSC project number, e.g. 2001234, to match the bucket you created earlier
- Create a new batch job script. First open a new text file with the command:
nano allas_<myjobname>.sh # replace <myjobname> with a custom name for your job
- Copy the batch job script below to the text file you are editing:
- Option 1:
a-commands
#!/bin/bash
#SBATCH --job-name=my_allas_job # Name of the job visible in the queue.
#SBATCH --account=<project> # Choose the billing project. Has to be defined!
#SBATCH --time=00:05:00 # Maximum duration of the job. Max: depends of the partition.
#SBATCH --mem-per-cpu=1G # How much RAM is reserved for one processor.
#SBATCH --partition=test # Job queues: test, interactive, small, large, longrun, hugemem, $
#SBATCH --output=allas_output_%j.txt # Name of the output-file.
#SBATCH --error=allas_errors_%j.txt # Name of the error-file.
bucketname=<project number>_$USER # Replace with your bucket name, e.g. 2001234_username
filename=<filename> # Replace with your file name
a-get $bucketname/$filename # Bucket name / file name
wc -l $filename > $filename.num_rows # file name
a-put -b $bucketname $filename.num_rows
- In the script, replace
<project number>_$USER
to match your bucket name and<filename>
to the name of the file you have in Allas. Remember to also define your billing project (--account
).
- Option 2:
rclone
💭 If you use rclone
or swift
instead of the a-commands
, you need to add source allas_conf
commands to your script.
#!/bin/bash
#SBATCH --job-name=my_allas_job # Name of the job visible in the queue.
#SBATCH --account=<project> # Choose the billing project. Has to be defined!
#SBATCH --time=00:05:00 # Maximum duration of the job. Max: depends of the partition.
#SBATCH --mem-per-cpu=1G # How much RAM is reserved for one processor.
#SBATCH --partition=test # Job queues: test, interactive, small, large, longrun, hugemem, $
#SBATCH --output=allas_output_%j.txt # Name of the output-file.
#SBATCH --error=allas_errors_%j.txt # Name of the error-file.
bucketname=<project number>_$USER # Replace with your bucket name, e.g. 2001234_username
filename=<filename> # Replace with your file name
# Make sure the connection to Allas is open
source /appl/opt/csc-cli-utils/allas-cli-utils/allas_conf -f -k $OS_PROJECT_NAME
rclone copy allas:$bucketname/$filename ./
wc -l $filename > $filename.num_rows
# Make sure the connection to Allas is open
source /appl/opt/csc-cli-utils/allas-cli-utils/allas_conf -f -k $OS_PROJECT_NAME
rclone copy $filename.num_rows allas:$bucketname
- Replace
<project number>_$USER
to match your bucket name and<filename>
to the name of the file you have in Allas. Remember to also define your billing project (--account
). - Submit the batch job with the command:
sbatch allas_<myjobname>.sh
- Monitor the progress of your batch job:
squeue -u $USER
a-list <project number>_$USER # replace <project number> with your CSC project number, e.g. 2001234, to match your bucket