This is an extra exercise which can not be run on Puhti. You will need access to a computer or virtual machine where you have root privileges and that has Apptainer installed.
On Puhti, you can use Tykky to easily containerize Conda environments. This method is recommended over the manual procedure detailed in this exercise, which is mainly provided for you to develop your skills in working with containers. For tutorials on using Tykky, see:
Conda is a useful tool for installing software with complex dependencies. It has, however, some problems, especially on HPC systems like Puhti with shared parallel file systems. Because of these issues, installing Conda environments directly on the file system of CSC supercomputers is not allowed (see usage policy).
The main problems of Conda environments are related to storage. Conda environments are quite large, containing tens to hundreds of thousands of files. Just 3-4 environments are enough to fill the basic quota of a project’s /projappl
directory. Moreover, many of these files will be accessed each time you launch a program installed with Conda, generating massive I/O load which may degrade the performance of the system for all users.
Conda environments can also be somewhat sensitive to changes in the base system, meaning that, e.g., updates on Puhti can sometimes break existing Conda environments, necessitating a re-install.
Using an Apptainer container can help with both problems. A container is just a single file that is typically smaller than the total size of the Conda environment directory. It is also less sensitive to changes in the host system.
It is relatively easy to containerize an existing Conda environment.
You should first check if the software package is already available as an Apptainer/Singularity or Docker container. The advantage of a ready-made container is that it can usually be pulled/converted with normal user privileges on Puhti.
You can find more detailed instructions on converting Docker containers in Docs CSC.
If you have an existing Conda environment, you can save the environment.yml
file and use it to replicate the environment in a container.
Please note that the environment.yml
file will only reflect changes to the environment made using conda
commands. If you have made any changes directly, you will need to replicate those changes in the definition file.
Make sure the environment you want to replicate is activated, and give the command:
conda env export > environment.yml
You can try with one of your own environments, or download an example to use for this exercise:
wget https://raw.githubusercontent.com/CSCfi/csc-env-eff/master/data/environment.yml
In addition to the environment.yml
file, you will need an Apptainer definition file. Create a file called conda_environment.def
with the following content (copy/paste).
Bootstrap: docker
From: continuumio/miniconda3
%files
environment.yml
%environment
%post
ENV_NAME=$(head -1 environment.yml | cut -d' ' -f2)
echo ". /opt/conda/etc/profile.d/conda.sh" >> $APPTAINER_ENVIRONMENT
echo "conda activate $ENV_NAME" >> $APPTAINER_ENVIRONMENT
. /opt/conda/etc/profile.d/conda.sh
conda env create -f environment.yml -p /opt/conda/envs/$ENV_NAME
conda clean --all
%runscript
exec "$@"
Make sure the files environment.yml
and conda_environment.def
are in the current directory and give the command:
sudo apptainer build fastx.sif conda_environment.def
This will build an Apptainer image file called fastx.sif
. We can now verify that it works:
apptainer exec fastx.sif fastq_to_fasta -h
The image file could now be transferred to and used on Puhti.
This particular environment was chosen because it is a good “bad example” of the effects different installation methods can have.
The software package is a collection of applications written in C++ with only a few dependencies. Usually, similar packages are best installed natively. In this case, however, the code is quite old, and it will not compile with modern versions of gcc
without some changes to the source code.
The software is available in the Bioconda repository, so it can also be installed with:
conda install fastx_toolkit
/projappl
is 100000 files, so this single installation would already use more than 25 % of that.Containerizing the Conda environment like we did in this exercise is better:
/projappl
is 50 GB, so this installation would only use less than 1 % of the quota.In this case there’s also another good option – converting a ready-made Docker container:
apptainer build fastx.sif docker://biocontainers/fastx-toolkit:v0.0.14-6-deb_cv1
Containers are not a “silver bullet” solution to all installation problems, but they are nonetheless a much more favorable alternative to direct Conda installations on HPC systems.