Using Allas in CSC’s HPC environment
Before the actual exercise, open a view to the Allas service in your browser using the Puhti web interface.
- Go to https://www.puhti.csc.fi and login with your account.
- Configure an Allas S3 connection using the Cloud storage configuration tool.
- You need to first authenticate by providing your CSC password.
- If you have several projects available, choose one that you want to use in this exercise.
- Once you’ve configured a connection, select
s3allas-project_<id>
from the Files dropdown menu in the top navigation bar. Replace<id>
with the number of the project you chose to use (e.g. 2001234). - During the exercise, you can use this web interface to get another view to the buckets and objects in Allas.
1. Login to Puhti
- Login to Puhti (open a login node shell if using the web interface):
ssh <username>@puhti.csc.fi # replace <username> with your CSC username
- In Puhti, check your environment with the command:
csc-workspaces
- Move to the
/scratch
directory of your project
cd /scratch/<project> # replace <project> with your CSC project, e.g. project_2001234
- Create your own subdirectory named with your username:
mkdir -p $USER
- Move to the directory:
cd $USER
2. Download data with wget
- Next, download a dataset and uncompress it
- The dataset contains some pythium genomes with related BWA indexes
wget https://a3s.fi/course_12.11.2019/pythium.tgz
tar -xzvf pythium.tgz
tree pythium
3. Using Allas
- Open a connection to Allas:
module load allas
allas-conf
- If you have several Allas projects available, select the same project as earlier
Upload case 1: rclone
- Upload the data from Puhti to Allas with
rclone
:
rclone -P copyto pythium allas:$USER-genomes-rc/
- How long did the data upload take?
- What was the transfer rate?
- How long would it take to transfer 100 GiB assuming the same speed?
- Study what you have uploaded to Allas with the commands:
rclone lsd allas:
rclone ls allas:$USER-genomes-rc/
rclone lsl allas:$USER-genomes-rc/
rclone lsf allas:$USER-genomes-rc/
- Check how this looks like in the Puhti web interface. Open a browser and go to https://www.puhti.csc.fi/
- In the Puhti web interface, go to the Files app and select
s3allas-project_<id>
to list the buckets of your project (replace<id>
as needed). - Locate your own
$USER-genomes-rc
bucket and download one of the uploaded fasta files to your local computer
💡 You can read more about moving files at Docs CSC: Copying files using scp and Moving data with rclone
Upload case 2: a-put
- Upload the pythium directory from Puhti to Allas using a-commands
- Case 1: Store everything as a single object (replace
<project number>
with your CSC project number, e.g. 2001234):
a-put pythium
a-list
a-list <project number>-puhti-SCRATCH
a-info <project number>-puhti-SCRATCH/$USER/pythium.tar
- Case 2: Each subdirectory (species) as a separate object (replace
<project number>
with your CSC project number, e.g. 2001234):
a-put pythium/*
a-list <project number>-puhti-SCRATCH
a-check pythium/*
a-info <project number>-puhti-SCRATCH/$USER/pythium/pythium_vexans.tar
- Case 3: Use a custom bucket name (replace
<project number>
with your project number, e.g. 2001234):
a-put pythium/* -b <project number>-$USER-genomes-ap
a-list <project number>-$USER-genomes-ap
- Can you see the difference between the three
a-put
commands above? - Study the
<project number>-$USER-genomes-ap
bucket with commands:
a-list <project number>-$USER-genomes-ap
rclone ls allas:<project number>-$USER-genomes-ap
- Why do the two commands above list a different amount of objects?
- Try the command (replace
<project number>
with your project number, e.g. 2001234):
a-info <project number>-$USER-genomes-ap/pythium_vexans.tar
- This command is actually the same as:
rclone cat allas:<project number>-$USER-genomes-ap/pythium_vexans.tar_ameta
- Finally, try the command:
a-flip pythium/pythium_vexans/pythium_vexans.fasta
- Try opening the public link that
a-flip
produced with your browser
Upload case 3: allas-backup
- Run the commands:
allas-backup -help
allas-backup pythium
allas-backup list
- What did these commands do to your data?
4. Exit
- The data in the
pythium
directory is now stored in many ways in Allas, so we can remove the data from Puhti and log out:
rm -r pythium
exit
5. Downloading data from Allas to Puhti
- Login to Puhti and move to your personal directory in your project’s
/scratch
:
ssh <username>@puhti.csc.fi # replace <username> with your CSC username
cd /scratch/<project>/$USER # replace `<project>` with your CSC project, e.g. project_2001234
- In Puhti, check you projects with the command:
csc-workspaces
- Set up the Allas connection:
module load allas
allas-conf
- Then run the commands (we will use the same bucket that was created earlier):
a-list
rclone lsd allas:
# replace <project number> with your project number, e.g. 2001234
a-list <project number>-$USER-genomes-ap
rclone ls allas:<project number>-$USER-genomes-ap
a-find pythium_vexans.fasta
a-find -a pythium_vexans.fasta
- Next, download the data in different ways:
1. Download with rclone
mkdir rclone_dir
cd rclone_dir/
- Copy everything:
mkdir all
rclone ls allas:<project number>-$USER-genomes-ap
rclone copyto -P allas:<project number>-$USER-genomes-ap all/
ls all
- Copy a set of objects:
mkdir vexans
rclone copyto allas:$USER-genomes-rc/pythium_vexans vexans/
ls vexans
- Copy just one object:
rclone copyto allas:$USER-genomes-rc/pythium_vexans/pythium_vexans.fasta ./vexans.fasta
ls
2. Download with a-get
- Return to your
$USER
directory under your project’s/scratch
on Puhti (Thepwd
command should print/scratch/<project/$USER
):
cd ..
pwd
- Make a new directory:
mkdir a_dir
cd a_dir/
- Create a directory
all
and move there:
mkdir all
cd all
- List your default
SCRATCH
bucket (replace<project number>
with your project number, e.g. 2001234):
a-list <project number>-puhti-SCRATCH
a-list <project number>-puhti-SCRATCH/$USER
- Look for the file
pythium_vexans.fasta
in your PuhtiSCRATCH
bucket:
a-find pythium_vexans.fasta -b <project number>-puhti-SCRATCH # replace <project number> with your project number, e.g. 2001234
- Download the full dataset with command:
a-get <project number>-puhti-SCRATCH/$USER/pythium.tar # replace <project number> with your project number, e.g. 2001234
- Check what you got:
ls -l
ls -R
- Now, download just a single genome dataset:
cd ..
a-get <project number>-puhti-SCRATCH/$USER/pythium/pythium_vexans.tar # replace <project number> with your project number, e.g. 2001234
ls -l pythium/
ls -l pythium/pythium_vexans/
3. Downloading data from allas-backup
- Return to your main scratch directory and make a new directory:
cd ..
mkdir a_backup
cd a_backup/
- Use the commands below to find out the ID of the most recent backup version of your pythium directory:
allas-backup list
allas-backup list | grep $USER
- Use
allas-backup restore
to download the data:
allas-backup restore <id string> # replace <id string> with the ID of your backup snapshot
ls -l
ls -l pythium