In this tutorial you will learn:
- About the
- How to search for applications
- How to install Bioconda packages
💬 Let’s imagine that we have some sequencing data that we wish to align to a reference genome and check the quality of the alignment.
💡 The biokit module loads a set of commonly used bioinformatics tools.
module spider hisat2
☝🏻 All software installed on CSC’s supercomputers don’t necessarily have their own documentation page in the application list (yet). They might be new installations or installed by request of a single research group etc.
biokitmodule and see what is included:
module load biokit module list
💬 Let’s imagine you just did a successful aligning of the sequence data.
💬 As you can see from the
module list command above, the RSeQC tool is not included in the
module spider rseqc
Bioconda is a popular Conda channel for bioinformatics software. It provides an easy method to install thousands of software packages related to biomedical research. Conda environments are, however, problematic on supercomputers with parallel file systems since they create too many files. The solution is to use containerized environments.
module spider metabat2
docker pull quay.io/biocontainers/metabat2:<tag>
module purge module load tykky
mkdir -p /projappl/<project>/$USER/metabat-2.15 # replace <project> with your CSC project, e.g. project_2001234
wrap-container -w /usr/local/bin docker://quay.io/biocontainers/metabat2:2.15--h986a166_1 --prefix /projappl/<project>/$USER/metabat-2.15 # replace <project> with your CSC project, e.g. project_2001234
-w option specifies the installation directory inside the container. For containers from Bioconda this is always
--prefix option is used to indicate the directory where we want to install the software.
💡 After the installations finishes, the executables of the program will be in the directory
metabat-2.15/bin. Note that these are not the actual binaries, but rather wrapper scripts for the executables inside the container. You can, however, use them as if they were the actual commands.
bindirectory to your
$PATHas suggested by Tykky. This is analogous to activating the Conda environment in case of a direct Conda installation and allows you to execute commands from anywhere (without providing the full path to the binaries):
export PATH="/projappl/<project>/$USER/metabat-2.15/bin:$PATH" # replace <project> with your CSC project, e.g. project_2001234
💬 Make sure to load all necessary modules and export required paths also in your batch scripts before launching any actual commands. It is good practice to start with
module purge to ensure that you are working in a clean environment.
☝🏻 Note that if you are writing a batch script that uses applications from different modules, you should be mindful of the order in which you load (and possibly unload) the modules. Loading one module might automatically replace other ones to avoid conflicts.