Where to store files in CSC’s computing environment?
In this tutorial you
- Familiarize yourself with personal and project-specific disk areas and their quotas on CSC supercomputers.
- Learn how to share your files, such as software installations and data, to other project members on CSC supercomputers.
💬 Each user of CSC supercomputers (Puhti and Mahti) have access to different disk areas (or directories) for managing their data. Each disk area has its own specific purpose.
💬 Active data files needed for computational simulations and analyses should be stored and shared in directories under /scratch
while any software installations and binaries should be shared under the /projappl
directory.
Identify your personal and project-specific directories on Puhti and Mahti supercomputers
- First login to Puhti using SSH (or by opening a login node shell in the Puhti web interface):
ssh <username>@puhti.csc.fi # replace <username> with your CSC username, e.g. myname@puhti.csc.fi
- Get an overview of your projects and directories by running the following commands on the login node:
csc-projects
csc-workspaces
- Inspect the output information summarizing your directories and their current quotas.
- Visit your project’s
/scratch
directory and list its contents:
cd /scratch/<project>/ # replace <project> with your CSC project, e.g. project_2001234
ls
- Visit your project’s
/projappl
directory and list its contents:
cd /projappl/<project>/ # replace <project> with your CSC project, e.g. project_2001234
ls
💬 These directories can be briefly summarized as follows:
- User-specific directory (i.e. your personal home folder)
- Your home directory (path stored in environment variable
$HOME
) - The default directory when you login to Puhti/Mahti
- You can store configuration files and other minor data for personal use
- Your home directory (path stored in environment variable
- Project-specific directories:
- The project’s
/scratch
and/projappl
directories - Each project has its own
/scratch
disk space where most computational tasks are performed. The/scratch
area is a temporary space not intended for long-term data storage! Please move inactive data to e.g. Allas. /projappl
directory on the other hand is mainly for storing and sharing compiled applications and libraries etc. with other members of the project.
- The project’s
Sharing binaries and data files
💬 Data transfer between two supercomputers can be done e.g. with rsync
.
Download the example files
☝🏻 In this example you will download data from Allas object storage, but keep in mind that one should avoid using Allas to do data transfer between Puhti and Mahti.
- Move to your home folder:
cd
💡 If you know the files are large, you should consider downloading them directly to /scratch
.
- Download an example program package (
ggplot2_3.3.3_Rprogramme.tar.gz
) and a data file (Merged.fasta
) from the Allas object storage
wget https://a3s.fi/CSC_training/shared_files.tar.gz
tar -xavf shared_files.tar.gz
cd shared_files
Let’s assume that
Merged.fasta
is a data file intended for computational useggplot2_3.3.3_Rprogramme.tar.gz
is a software tool needed for the analysis.
Move the files to Puhti /scratch
and /projappl
- Create folders with your username (using environment variable
$USER
) in your project directories under/scratch
and/projappl
on Puhti.
mkdir -p /projappl/<project>/$USER # replace <project> with your CSC project, e.g. project_2001234
mkdir -p /scratch/<project>/$USER # replace <project> with your CSC project, e.g. project_2001234
- Copy your
ggplot2_3.3.3_Rprogramme.tar.gz
file to the/projappl
directory
cp ggplot2_3.3.3_Rprogramme.tar.gz /projappl/<project>/$USER/ # replace <project> with your CSC project, e.g. project_2001234
- Copy the
Merged.fasta
file to the/scratch
directory
cp Merged.fasta /scratch/<project>/$USER/ # replace <project> with your CSC project, e.g. project_2001234
- Note that all new files and directories are also fully accessible to other members of the project (including read and write permissions).
- Set read-only permissions for your project members for the file
Merged.fasta
:
cd /scratch/<project>/$USER/ # replace <project> with your CSC project, e.g. project_2001234
chmod g-w Merged.fasta # g-w means that we "subtract" write permissions for users belong to our group (g), i.e. our project
Copying files from Puhti to Mahti (it is an optional task as it needs Mahti access)
- Change to the folder where you have the example files
- Copy
Merged.fasta
file from Puhti to the/scratch
drive of Mahti:
rsync -P Merged.fasta <username>@mahti.csc.fi:/scratch/<project>/$USER/ # replace <username> with your CSC username and <project> with your CSC project, e.g. project_2001234
- Copy the
ggplot2_3.3.3_Rprogramme.tar.gz
file from Puhti to the/projappl
directory on Mahti:
rsync -P ggplot2_3.3.3_Rprogramme.tar.gz <username>@mahti.csc.fi:/projappl/<project>/$USER/ # replace <username> with your CSC username and <project> with your CSC project, e.g. project_2001234
More information
💡 Hint: You can use your folder under /scratch
for the rest of the tutorials. You can save the path using an alias (with cd
or echo
) or somewhere in your notes.
💡 It is sometimes required to export the paths of the /scratch
or /projappl
directories in environmental variables (until logout). This can be done with the following commands:
export PROJAPPL=/projappl/<project>/ # replace <project> with your CSC project, e.g. project_2001234
export SCRATCH=/scratch/<project>/ # replace <project> with your CSC project, e.g. project_2001234