Using Allas in CSC’s HPC environment
Before the actual exercise, open a view to the Allas service in your browser using the Puhti web interface.
- Go to https://www.puhti.csc.fi and login with your account.
- Configure an Allas S3 connection using the Cloud storage configuration tool.
- You need to first authenticate by providing your CSC password.
- If you have several projects available, choose one that you want to use in this exercise.
- Once you’ve configured a connection, select
s3allas-project_<id>
from the Files dropdown menu in the top navigation bar. Replace<id>
with the number of the project you chose to use (e.g. 2001234). - During the exercise, you can use this web interface to get another view to the buckets and objects in Allas.
1. Login to Puhti
-
Login to Puhti (open a login node shell if using the web interface):
ssh <username>@puhti.csc.fi # replace <username> with your CSC username
-
In Puhti, check your environment with the command:
csc-workspaces
-
Move to the
/scratch
directory of your projectcd /scratch/<project> # replace <project> with your CSC project, e.g. project_2001234
-
Create your own subdirectory named with your username:
mkdir -p $USER
-
Move to the directory:
cd $USER
2. Download data with wget
- Next, download a dataset and uncompress it
- The dataset contains some pythium genomes with related BWA indexes
wget https://a3s.fi/course_12.11.2019/pythium.tgz tar -xzvf pythium.tgz tree pythium
3. Using Allas
-
Open a connection to Allas:
module load allas allas-conf
-
If you have several Allas projects available, select the same project as earlier
Upload case 1: rclone
-
Upload the data from Puhti to Allas with
rclone
:rclone -P copyto pythium allas:$USER-genomes-rc/
- How long did the data upload take?
- What was the transfer rate?
- How long would it take to transfer 100 GiB assuming the same speed?
-
Study what you have uploaded to Allas with the commands:
rclone lsd allas: rclone ls allas:$USER-genomes-rc/ rclone lsl allas:$USER-genomes-rc/ rclone lsf allas:$USER-genomes-rc/
- Check how this looks like in the Puhti web interface. Open a browser and go to https://www.puhti.csc.fi/
- In the Puhti web interface, go to the Files app and select
s3allas-project_<id>
to list the buckets of your project (replace<id>
as needed). - Locate your own
$USER-genomes-rc
bucket and download one of the uploaded fasta files to your local computer
💡 You can read more about moving files at Docs CSC: Copying files using scp and Moving data with rclone
Upload case 2: a-put
- Upload the pythium directory from Puhti to Allas using a-commands
-
Case 1: Store everything as a single object (replace
<project number>
with your CSC project number, e.g. 2001234):a-put pythium a-list a-list <project number>-puhti-SCRATCH a-info <project number>-puhti-SCRATCH/$USER/pythium.tar
-
Case 2: Each subdirectory (species) as a separate object (replace
<project number>
with your CSC project number, e.g. 2001234):a-put pythium/* a-list <project number>-puhti-SCRATCH a-check pythium/* a-info <project number>-puhti-SCRATCH/$USER/pythium/pythium_vexans.tar
-
Case 3: Use a custom bucket name (replace
<project number>
with your project number, e.g. 2001234):a-put pythium/* -b <project number>-$USER-genomes-ap a-list <project number>-$USER-genomes-ap
- Can you see the difference between the three
a-put
commands above? -
Study the
<project number>-$USER-genomes-ap
bucket with commands:a-list <project number>-$USER-genomes-ap rclone ls allas:<project number>-$USER-genomes-ap
- Why do the two commands above list a different amount of objects?
-
Try the command (replace
<project number>
with your project number, e.g. 2001234):a-info <project number>-$USER-genomes-ap/pythium_vexans.tar
-
This command is actually the same as:
rclone cat allas:<project number>-$USER-genomes-ap/pythium_vexans.tar_ameta
-
Finally, try the command:
a-flip pythium/pythium_vexans/pythium_vexans.fasta
- Try opening the public link that
a-flip
produced with your browser
Upload case 3: allas-backup
-
Run the commands:
allas-backup -help allas-backup pythium allas-backup list
-
What did these commands do to your data?
4. Exit
-
The data in the
pythium
directory is now stored in many ways in Allas, so we can remove the data from Puhti and log out:rm -r pythium exit
5. Downloading data from Allas to Puhti
-
Login to Puhti and move to your personal directory in your project’s
/scratch
:ssh <username>@puhti.csc.fi # replace <username> with your CSC username cd /scratch/<project>/$USER # replace `<project>` with your CSC project, e.g. project_2001234
-
In Puhti, check you projects with the command:
csc-workspaces
-
Set up the Allas connection:
module load allas allas-conf
-
Then run the commands (we will use the same bucket that was created earlier):
a-list rclone lsd allas: # replace <project number> with your project number, e.g. 2001234 a-list <project number>-$USER-genomes-ap rclone ls allas:<project number>-$USER-genomes-ap a-find pythium_vexans.fasta a-find -a pythium_vexans.fasta
-
Next, download the data in different ways:
1. Download with rclone
-
Copy everything:
mkdir rclone_dir cd rclone_dir/ mkdir all rclone ls allas:<project number>-$USER-genomes-ap rclone copyto -P allas:<project number>-$USER-genomes-ap all/ ls all
-
Copy a set of objects:
mkdir vexans rclone copyto allas:$USER-genomes-rc/pythium_vexans vexans/ ls vexans
-
Copy just one object:
rclone copyto allas:$USER-genomes-rc/pythium_vexans/pythium_vexans.fasta ./vexans.fasta ls
2. Download with a-get
-
Return to your
$USER
directory under your project’s/scratch
on Puhti (Thepwd
command should print/scratch/<project/$USER
):cd .. pwd
-
Make a new directory:
mkdir a_dir cd a_dir/
-
Create a directory
all
and move there:mkdir all cd all
-
List your default
SCRATCH
bucket (replace<project number>
with your project number, e.g. 2001234):a-list <project number>-puhti-SCRATCH a-list <project number>-puhti-SCRATCH/$USER
-
Look for the file
pythium_vexans.fasta
in your PuhtiSCRATCH
bucket:a-find pythium_vexans.fasta -b <project number>-puhti-SCRATCH # replace <project number> with your project number, e.g. 2001234
-
Download the full dataset with command:
a-get <project number>-puhti-SCRATCH/$USER/pythium.tar # replace <project number> with your project number, e.g. 2001234
-
Check what you got:
ls -l ls -R
-
Now, download just a single genome dataset:
cd .. a-get <project number>-puhti-SCRATCH/$USER/pythium/pythium_vexans.tar # replace <project number> with your project number, e.g. 2001234 ls -l pythium/ ls -l pythium/pythium_vexans/
3. Downloading data from allas-backup
-
Return to your main scratch directory and make a new directory:
cd .. mkdir a_backup cd a_backup/
-
Use the commands below to find out the ID of the most recent backup version of your pythium directory:
allas-backup list allas-backup list | grep $USER
-
Use
allas-backup restore
to download the data:allas-backup restore <id string> # replace <id string> with the ID of your backup snapshot ls -l ls -l pythium