Python
This tutorial is done on Puhti, which requires that:
- You have a user account at CSC.
- Your account belongs to a project that has access to the Puhti service.
💬 To run Python applications, first load a suitable Python module. CSC has several Python environments available with focus on different application areas, e.g. data science and machine learning, as well as geoinformatics.
💡 For more details, please see the Python page in Docs CSC.
💭 By selecting a suitable Python environment to start with, you’ll minimize the need to install additional packages.
☝🏻 Note that Conda environments should be containerized according to our usage policy. Use the Tykky container wrapper to accomplish this easily!
Installing Python packages
💬 To install simple packages it is usually enough to use pip
, for example:
pip install --user <package name> # Or pip3 to ensure use of Python 3
☝🏻 Remember to include --user
. By default, pip
tries to install to the system Python installation path, which will not work.
🗯 For more complex installations you should create a containerized environment.
💡 See the Docs CSC pages for each Python module for more details, as there may be some environment-specific instructions.
Example: Installing a simple package with pip
💬 Let’s install a library called coverage
.
-
Start by loading a Python module and checking if the library is already installed:
module load python-data python -c "import coverage"
-
The error message is indicating that the library is not available:
Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'coverage'
-
Install the missing library:
pip3 install --user coverage # This may take a while - don't worry!
-
Re-test to see if the library is now available:
python -c "import coverage"
- This time there’s no error message, indicating that the import was successful!
-
User libraries are installed by default under
$HOME/.local
. It is a good idea to change the installation folder, as the space in$HOME
is limited:export PYTHONUSERBASE=/path/to/another/installdir/ # /projappl is recommended
-
To uninstall the package:
pip3 uninstall coverage
- Type
y
to confirm.
‼️ Note, if the package you installed also contains executable files, i.e. a command-line interface, these commands may not work as is! This is because the Python modules provided by CSC are containerized and the user-installed binaries will refer to an inaccessible Python path inside the container. For workaround instructions, see our Python documentation, or install your own environment from the beginning inside a container as outlined in the following example.
Example: Containerizing a Conda environment with Tykky
💬 Let’s create a containerized Conda environment using the Tykky wrapper.
-
Create a folder under your project’s
/projappl
directory for the installation, e.g.:mkdir -p /projappl/<project>/$USER/tykky-env # replace <project> with your CSC project, e.g. project_2001234
-
Create an
env.yml
environment file defining the packages to be installed. Using for examplenano
, copy/paste the following contents to the file:channels: - conda-forge dependencies: - python=3.10.8 - scipy - pandas - nglview
-
Purge your current module environment and load the Tykky module:
module purge module load tykky
-
Create and containerize the Conda environment using the
conda-containerize
command:conda-containerize new --prefix /projappl/<project>/$USER/tykky-env env.yml # replace <project> with your CSC project, e.g. project_2001234
☝🏻 This process can take several minutes so be patient.
-
As instructed by Tykky, add the path to the installation
bin
directory to your$PATH
:export PATH="/projappl/<project>/$USER/tykky-env/bin:$PATH" # replace <project> with your CSC project, e.g. project_2001234
💡 Adding this to your $PATH
allows you to call Python and all other executables installed by Conda in the same way as you had activated a non-containerized Conda environment.
💭 The above Conda installation would create more than 40k files if installed directly on the parallel file system. Containerizing the environment with Tykky decreases this to less than 200, thus avoiding Lustre performance issues.
💬 To modify an existing Tykky-based Conda environment, you can use the update
sub-command of conda-containerize
together with the --post-install
option to specify a bash script with commands to run to update the installation. See more details in Docs CSC.