Using Allas in CSC’s HPC environment
Before the actual exercise, open a view to the Allas service in your browser using the Puhti web interface.
- Go to https://www.puhti.csc.fi and login with your account.
- Configure an Allas S3 connection using the Cloud storage configuration tool.
- You need to first authenticate by providing your CSC password.
- If you have several projects available, choose one that you want to use in this exercise.
- Once you’ve configured a connection, select
s3allas-project_<id>from the Files dropdown menu in the top navigation bar. Replace<id>with the number of the project you chose to use (e.g. 2001234). - During the exercise, you can use this web interface to get another view to the buckets and objects in Allas.
1. Login to Puhti
-
Login to Puhti (open a login node shell if using the web interface):
ssh <username>@puhti.csc.fi # replace <username> with your CSC username -
In Puhti, check your environment with the command:
csc-workspaces -
Move to the
/scratchdirectory of your projectcd /scratch/<project> # replace <project> with your CSC project, e.g. project_2001234 -
Create your own subdirectory named with your username:
mkdir -p $USER -
Move to the directory:
cd $USER
2. Download data with wget
- Next, download a dataset and uncompress it
- The dataset contains some pythium genomes with related BWA indexes
wget https://a3s.fi/course_12.11.2019/pythium.tgz tar -xzvf pythium.tgz tree pythium
3. Using Allas
-
Open a connection to Allas:
module load allas allas-conf -
If you have several Allas projects available, select the same project as earlier
Upload case 1: rclone
-
Upload the data from Puhti to Allas with
rclone:rclone -P copyto pythium allas:$USER-genomes-rc/- How long did the data upload take?
- What was the transfer rate?
- How long would it take to transfer 100 GiB assuming the same speed?
-
Study what you have uploaded to Allas with the commands:
rclone lsd allas: rclone ls allas:$USER-genomes-rc/ rclone lsl allas:$USER-genomes-rc/ rclone lsf allas:$USER-genomes-rc/ - Check how this looks like in the Puhti web interface. Open a browser and go to https://www.puhti.csc.fi/
- In the Puhti web interface, go to the Files app and select
s3allas-project_<id>to list the buckets of your project (replace<id>as needed). - Locate your own
$USER-genomes-rcbucket and download one of the uploaded fasta files to your local computer
💡 You can read more about moving files at Docs CSC: Copying files using scp and Moving data with rclone
Upload case 2: a-put
- Upload the pythium directory from Puhti to Allas using a-commands
-
Case 1: Store everything as a single object (replace
<project number>with your CSC project number, e.g. 2001234):a-put pythium a-list a-list <project number>-puhti-SCRATCH a-info <project number>-puhti-SCRATCH/$USER/pythium.tar -
Case 2: Each subdirectory (species) as a separate object (replace
<project number>with your CSC project number, e.g. 2001234):a-put pythium/* a-list <project number>-puhti-SCRATCH a-check pythium/* a-info <project number>-puhti-SCRATCH/$USER/pythium/pythium_vexans.tar -
Case 3: Use a custom bucket name (replace
<project number>with your project number, e.g. 2001234):a-put pythium/* -b <project number>-$USER-genomes-ap a-list <project number>-$USER-genomes-ap - Can you see the difference between the three
a-putcommands above? -
Study the
<project number>-$USER-genomes-apbucket with commands:a-list <project number>-$USER-genomes-ap rclone ls allas:<project number>-$USER-genomes-ap - Why do the two commands above list a different amount of objects?
-
Try the command (replace
<project number>with your project number, e.g. 2001234):a-info <project number>-$USER-genomes-ap/pythium_vexans.tar -
This command is actually the same as:
rclone cat allas:<project number>-$USER-genomes-ap/pythium_vexans.tar_ameta -
Finally, try the command:
a-flip pythium/pythium_vexans/pythium_vexans.fasta - Try opening the public link that
a-flipproduced with your browser
Upload case 3: allas-backup
-
Run the commands:
allas-backup -help allas-backup pythium allas-backup list -
What did these commands do to your data?
4. Exit
-
The data in the
pythiumdirectory is now stored in many ways in Allas, so we can remove the data from Puhti and log out:rm -r pythium exit
5. Downloading data from Allas to Puhti
-
Login to Puhti and move to your personal directory in your project’s
/scratch:ssh <username>@puhti.csc.fi # replace <username> with your CSC username cd /scratch/<project>/$USER # replace `<project>` with your CSC project, e.g. project_2001234 -
In Puhti, check you projects with the command:
csc-workspaces -
Set up the Allas connection:
module load allas allas-conf -
Then run the commands (we will use the same bucket that was created earlier):
a-list rclone lsd allas: # replace <project number> with your project number, e.g. 2001234 a-list <project number>-$USER-genomes-ap rclone ls allas:<project number>-$USER-genomes-ap a-find pythium_vexans.fasta a-find -a pythium_vexans.fasta -
Next, download the data in different ways:
1. Download with rclone
-
Copy everything:
mkdir rclone_dir cd rclone_dir/ mkdir all rclone ls allas:<project number>-$USER-genomes-ap rclone copyto -P allas:<project number>-$USER-genomes-ap all/ ls all -
Copy a set of objects:
mkdir vexans rclone copyto allas:$USER-genomes-rc/pythium_vexans vexans/ ls vexans -
Copy just one object:
rclone copyto allas:$USER-genomes-rc/pythium_vexans/pythium_vexans.fasta ./vexans.fasta ls
2. Download with a-get
-
Return to your
$USERdirectory under your project’s/scratchon Puhti (Thepwdcommand should print/scratch/<project/$USER):cd .. pwd -
Make a new directory:
mkdir a_dir cd a_dir/ -
Create a directory
alland move there:mkdir all cd all -
List your default
SCRATCHbucket (replace<project number>with your project number, e.g. 2001234):a-list <project number>-puhti-SCRATCH a-list <project number>-puhti-SCRATCH/$USER -
Look for the file
pythium_vexans.fastain your PuhtiSCRATCHbucket:a-find pythium_vexans.fasta -b <project number>-puhti-SCRATCH # replace <project number> with your project number, e.g. 2001234 -
Download the full dataset with command:
a-get <project number>-puhti-SCRATCH/$USER/pythium.tar # replace <project number> with your project number, e.g. 2001234 -
Check what you got:
ls -l ls -R -
Now, download just a single genome dataset:
cd .. a-get <project number>-puhti-SCRATCH/$USER/pythium/pythium_vexans.tar # replace <project number> with your project number, e.g. 2001234 ls -l pythium/ ls -l pythium/pythium_vexans/
3. Downloading data from allas-backup
-
Return to your main scratch directory and make a new directory:
cd .. mkdir a_backup cd a_backup/ -
Use the commands below to find out the ID of the most recent backup version of your pythium directory:
allas-backup list allas-backup list | grep $USER -
Use
allas-backup restoreto download the data:allas-backup restore <id string> # replace <id string> with the ID of your backup snapshot ls -l ls -l pythium