Upon completion of this tutorial, you will be familiar with ideal disk areas for I/O intensive workloads, i.e. frequent read and write operations
💬 You may sometimes come across situations where you have to process a large number of smaller files, which can cause heavy input/output load on the shared file system used in CSC’s computing environment.
💬 In order to facilitate such heavy I/O operations, CSC provides fast local disk areas on the login and compute nodes (excluding Mahti CPU nodes).
echo $TMPDIR
💡 The local disk area on the login nodes is meant for light-weight pre-processing of data and I/O intensive tasks such as software compilation. Actual computations should be submitted to the batch queue from the /scratch
disk.
💡 The local disk area on the login nodes are meant for temporary use and cleaned often, so make sure to move important data to /scratch
or /projappl
once you do not need the fast disk anymore. Note that e local disk is specific to a particular node, i.e. you cannot access the local disk of puhti-login11
from puhti-login12
.
cd $TMPDIR
wget https://a3s.fi/CSC_training/Individual_files.tar.gz
tar -xavf Individual_files.tar.gz
cd Individual_files
find . -name 'individual.fasta*' | xargs cat >> Merged.fasta
find . -name 'individual.fasta*' | xargs rm
☝🏻 If you intend to perform heavy computing tasks using a large number of small files, you have to use the fast local disk areas on the compute nodes instead of the login nodes. The compute nodes are accessed either interactively or using batch jobs.
echo $LOCAL_SCRATCH
echo $TMPDIR
$LOCAL_SCRATCH
in your batch job scripts to access the local storage on that node (only on Puhti)./scratch
area before analysis💭 Remember: the commands csc-projects
and csc-workspaces
reveal information about your projects.
$USER
) under a project-specific directory on the /scratch
disk (or skip this step if you already created the folder in a previous tutorial).mkdir -p /scratch/<project>/$USER/ # replace <project> with your CSC project, e.g. project_2001234
Merged.fasta
file) from the fast disk to /scratch
:mv Merged.fasta /scratch/<project>/$USER
/scratch
area and can start performing actual analysis using batch job scripts💡 Hint: You can use your folder under /scratch
for the rest of the tutorials. You can save the path using an alias (with cd
or echo
) or somewhere in your notes.
💡 It is sometimes required to export the paths of the /scratch
or /projappl
directories in environmental variables (until logout). This can be done with the following commands:
export PROJAPPL=/projappl/<project>/ # replace <project> with your CSC project, e.g. project_2001234
export SCRATCH=/scratch/<project>/ # replace <project> with your CSC project, e.g. project_2001234