💬 To run Python applications, first load a suitable Python module. CSC has several Python environments available with focus on different application areas, e.g. data science and bio-/geoinformatics.
💡 For details, please see the Python page in Docs CSC.
💭 By selecting a suitable Python environment to start with, you’ll minimize the need to install additional packages.
☝🏻 Note that Conda environments should be containerized according to our usage policy. See the Tykky container wrapper to accomplish this easily!
💬 To install simple packages it is usually enough to use pip
, for example:
pip install --user <package name> # Or pip3 to ensure use of Python 3
☝🏻 Remember to include --user
. By default, pip
tries to install to the system Python installation path, which will not work.
🗯 For more complex installations you should create a containerized environment.
💡 See the the Python documentation pages for each Python environment as there might be some environment-specific instructions.
💬 Let’s install a library called coverage
.
module load python-data
python -c "import coverage"
☝🏻 The error message is indicating that the library is not available:
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'coverage'
pip3 install --user coverage # This may take a while - don't worry!
python -c "import coverage"
💡 This time there’s no error message, indicating that the import was successful.
$HOME/.local
. To change the installation folder:export PYTHONUSERBASE=/path/to/your/preferred/installdir
pip3 uninstall coverage
y
to confirm.☝🏻 Note, if the package you installed also contains executable files these may not work. This is because the Python modules provided by CSC are containerized and the user-installed binaries will refer to an inaccessible Python path inside the container. For workaround instructions, see our Python documentation or install your own environment from scratch inside a container as outlined in the following example.
💬 Let’s create a containerized Conda environment using the Tykky wrapper.
/projappl
directory for the installation, e.g.:mkdir -p /projappl/<project>/$USER/tykky-env # replace <project> with your CSC project, e.g. project_2001234
env.yml
environment file defining the packages to be installed. Using for example nano
, copy/paste the following contents to the file:channels:
- conda-forge
dependencies:
- python=3.10.8
- scipy
- pandas
- nglview
module purge
module load tykky
conda-containerize
command:conda-containerize new --prefix /projappl/<project>/$USER/tykky-env env.yml # replace <project> with your CSC project, e.g. project_2001234
☝🏻 This process can take several minutes so be patient.
bin
directory to your $PATH
:export PATH="/projappl/<project>/$USER/tykky-env/bin:$PATH" # replace <project> with your CSC project, e.g. project_2001234
💡 Adding this to your $PATH
allows you to call Python and all other executables installed by Conda in the same way as you had activated a non-containerized Conda environment.
💭 The above Conda installation would create more than 40k files if installed directly on the parallel file system. Containerizing the environment with Tykky decreases this to less than 200, thus avoiding Lustre performance issues.
💬 To modify an existing Tykky-based Conda environment you can use the update
keyword of conda-containerize
together with the --post-install
option to specify a bash script with commands to run to update the installation. See more details in Docs CSC.