💬 To run Python applications, first load a suitable Python module. CSC has several Python environments available with focus on different application areas, e.g. data science and bio-/geoinformatics.
💡 For details, please see the Python page in Docs CSC.
💭 By selecting a suitable Python environment to start with, you’ll minimize the need to install additional packages.
💬 To install simple packages it is usually enough to use
pip, for example:
pip install --user <package name> # Or pip3 to ensure use of Python 3
☝🏻 Remember to include
--user. By default,
pip tries to install to the system Python installation path, which will not work.
🗯 For more complex installations you should create a containerized environment.
💡 See the the Python documentation pages for each Python environment as there might be some environment-specific instructions.
💬 Let’s install a library called
module load python-data python -c "import coverage"
☝🏻 The error message is indicating that the library is not available:
Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'coverage'
pip3 install --user coverage # This may take a while - don't worry!
python -c "import coverage"
💡 This time there’s no error message, indicating that the import was successful.
$HOME/.local. To change the installation folder:
pip3 uninstall coverage
☝🏻 Note, if the package you installed also contains executable files these may not work. This is because the Python modules provided by CSC are containerized and the user-installed binaries will refer to an inaccessible Python path inside the container. For workaround instructions, see our Python documentation or install your own environment from scratch inside a container as outlined in the following example.
💬 Let’s create a containerized Conda environment using the Tykky wrapper.
/projappldirectory for the installation, e.g.:
mkdir -p /projappl/<project>/$USER/tykky-env # replace <project> with your CSC project, e.g. project_2001234
env.ymlenvironment file defining the packages to be installed. Using for example
nano, copy/paste the following contents to the file:
channels: - conda-forge dependencies: - python=3.10.8 - scipy - pandas - nglview
module purge module load tykky
conda-containerize new --prefix /projappl/<project>/$USER/tykky-env env.yml # replace <project> with your CSC project, e.g. project_2001234
☝🏻 This process can take several minutes so be patient.
bindirectory to your
export PATH="/projappl/<project>/$USER/tykky-env/bin:$PATH" # replace <project> with your CSC project, e.g. project_2001234
💡 Adding this to your
$PATH allows you to call Python and all other executables installed by Conda in the same way as you had activated a non-containerized Conda environment.
💭 The above Conda installation would create more than 40k files if installed directly on the parallel file system. Containerizing the environment with Tykky decreases this to less than 200, thus avoiding Lustre performance issues.
💬 To modify an existing Tykky-based Conda environment you can use the
update keyword of
conda-containerize together with the
--post-install option to specify a bash script with commands to run to update the installation. See more details in Docs CSC.