This guide presents an overview of managing Python packages and environments on the Longleaf and Dogwood clusters.

Table of Contents

System-wide versions of Python

User-specific package installations with pip

Dependency management with virtual environments

Creating virtual environments with venv and pip

Modifying the installation path

Creating virtual environments with conda

Using virtual environments in a SLURM job

Using virtual environments in a Jupyter notebook

System-wide versions of Python

On Longleaf and Dogwood, there are several pre-installed versions of Python available as modulefiles. To view the available versions of Python, run the following command from within a command-line session on Longleaf or Dogwood:

$ module avail python

------------------------------------------ /nas/longleaf/apps/lmod/modulefiles/Core -------------------------------------------
   python/2.4.6     python/2.7.13        python/3.6.6    python/3.7.14    python/3.9.6
   python/2.7.12    python/3.5.1  (D)    python/3.7.9    python/3.8.8

  Where:
   D:  Default Module

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

To load a specific version of Python to your environment, use the module load command. To identify the location of the currently-loaded python interpreter, you may use the which command. For example:

$ module load python/3.9.6
$ which python
/nas/longleaf/rhel8/apps/python/3.9.6/bin/python

The system-wide versions of Python often include common scientific packages such as numpy and scipy, and are sufficient for basic computations; however, most research projects will require users to install additional packages themselves using pip or conda.

User-specific package installations with pip

There there are numerous open-source packages available for download through the Python Package Index (PyPi) repository. Pip is a package-management system that installs packages from the PyPi repository through a command-line interface. Using the pandas package as an example, users can install the latest release of the package to their home directory from within a command-line session on Longleaf using the following commands:

$ module load python/3.9.6
$ python -m pip install --user pandas

Because users only have write access to specific locations on Longleaf such as their home directories, it is necessary to include the --user flag when doing pip installs. In the above example, pandas will be installed to the user-specific site-packages directory for Python 3.9 (~/.local/lib/python3.9/site-packages).

Pip can be used to install specific package versions using the == operator:

$ python -m pip install --user pandas==2.0.2

Similarly, minimum package version requirements can be specified using the >= operator:

$ python -m pip install --user pandas>=2.0.2

For more information on installing packages using pip, please refer to the pip user guide.

Dependency management with virtual environments

In order to manage dependency requirements for multiple projects, we strongly recommend that users install their local packages into isolated "virtual" Python environments. A Python virtual environment is a directory structure that contains all the necessary executables and packages needed to build and run a Python-based project. Using separate virtual environments for each project allows users to install and upgrade Python packages as needed without having to worry about creating dependency conflicts in their other projects. In addition, using virtual environments helps to improve scientific reproducibility by allowing others to more easily replicate the computational environment in which an analysis was conducted.

Creating virtual environments with venv and pip

Venv is a lightweight tool for creating virtual environments that is included as part of the Python 3 Standard Library. To create a new virtual environment using venv, first load one of the Python 3 modulefiles from within a command-line session on Longleaf, then invoke the venv command followed by the path to the directory where you would like the environment to be created.

$ module purge 
$ module load python/3.9.6
$ python -m venv ~/<env_name>

In the above example, a new virtual environment with Python 3.9.6 is created in the user's home directory. This environment is created on top of the existing Python installation associated with the python/3.9.6 modulefile, which serves as a "base" for the virtual environment. Users may customize the version of Python used in the virtual environment by loading specific Python modulefiles during the environment creation process.

Modifying the installation path

In addition, users can create environments in different locations by modifying the installation path passed to the venv command.

$ module load python/3.5.1
$ python -m venv /users/<o>/<n>/<onyen>/<env_name>

In the above example, a new virtual environment with Python 3.5.1 is created in the user's /users directory. Before you can begin installing or using packages in your virtual environment, you will need to activate it:

$ source <path_to_env>/<env_name>/bin/activate

Note that the Python modulefile used as the environment's base must be loaded before the environment is activated for python code to work within the environment. Once an environment is activated, its name will appear to the left of the terminal prompt in parentheses (i.e., (<env_name>) $). You can also verify that you are in an active virtual environment by checking the location of your Python interpreter using the which python command, which should return the path to an interpreter located inside the <env_name> folder. Once activated, you may install packages to your environment using pip:

(<env_name>) $ python -m pip install pandas==2.0.2

When used from within an active virtual environment, pip will install Python packages to the environment's site-packages directory without needing to be told to do so explicitly. These packages will be available to you next time you activate the virtual environment. To uninstall a package, use pip uninstall <packagename. To obtain a list of packages installed to the currently-active environment, users may use the pip freeze command. This command is often used to create a requirements file listing out all of a project's dependencies.

(<env_name>) $ python -m pip freeze > requirements.txt

Requirements files are useful for reproducing the computational environment in which a scientific analysis was conducted. Users can use requirements files to install the dependencies needed to conduct an analysis on a new computer or in a new virtual environment using the following command:

(<new_env>) $ python -m pip install -r requirements.txt

To exit the currently-active virtual environment, simply run the deactivate command. Once deactivated, the name of the environment will no longer appear to the left of your terminal prompt.

(<env_name>) $ deactivate
$

To delete a venv, deactivate the environment and then delete the directory containing the environment.

Creating virtual environments with conda

Conda is an environment and package manager that allows users to create virtual environments and install a variety of packages from the Anaconda open-source repository. Unlike virtual environments created using venv, conda environments are almost entirely self-contained and do not require one of the system-wide Python modulefiles for use as a base. As such, users can create conda environments with almost any version of Python and are not limited to the versions available as modulefiles. In addition, conda environments can be used to manage packages and dependencies for programming languages other than Python such as R.

To use conda, first load one of the available Anaconda modulefiles from within a command-line session on Longleaf or Dogwood. When loading the Anaconda modulefile, please ensure that there aren't other Python modulefiles loaded at the same time as this can create conflicts. The general syntax for creating a conda environment with a specific version of Python and activating that environment is as follows:

$ module purge
$ module load anaconda
$ conda create --name=<env_name> python=3.9
$ conda activate <env_name>

In the above example, an environment using Python 3.9 is created in the user's home directory under ~/.conda/envs/<env_name>. Alternatively, users can create an environment in a custom location by using the --prefix option to specify the installation directory.

$ conda create --prefix=/users/<o>/<n>/<onyen>/<env_name> python=3.9

Note than when activating a conda environment created using the --prefix option, the path to the directory containing the environment must be passed as an argument to the conda activate command:

$ conda activate /users/<o>/<n>/<onyen>/<env_name>

After activating their conda environment, users can install packages from several online channels, which will be available to you each time you activate your conda environment. Using the pandas package as an example, users can use the following syntax to install packages from the Anaconda open-source repository:

(<env_name>) $ conda install -c anaconda pandas=2.0.2

In the above example, pandas version 2.0.2 is installed to the <env_name> environment. Many packages that are not included in the main Anaconda repository can be found on conda-forge, a community-run repository with a wide variety of Python packages. Using geopandas as an example, users can use the following syntax to install packages from conda-forge:

(<env_name>) $ conda install -c conda-forge geopandas

If you need to install a package that cannot be downloaded from the Anaconda or conda-forge repositories, you can use pip as an alternative method for installing the package.

(<env_name>) $ python -m pip install geopandas

In the above example, the geopandas package is downloaded from PyPi and installed to the site-packages directory of the currently-active conda environment (<env_name>/lib/python3.9/site-packages). Because conda is limited in its ability to control packages installed by pip, issues can arise when conda and pip are used together to create an environment. For this reason, you should avoid installing or updating packages using conda after doing pip installs within a conda environment.

You can uninstall a package from your conda environment using conda remove <package-name>, which will remove the specified package and any package that depends on the specified package. Users can view a list of Python packages installed to the currently-active environment using the conda list command. Users may exit the currently-active conda environment using the conda deactivate command. To delete a conda environment and all installed packages, deactivate your environment using conda deactivate and then run conda remove --name <env_name> --all. For more information on managing Python environments using conda, please refer to the conda user guide and cheat sheet.

Using virtual environments in a SLURM job

To submit a job utilizing a virtual environment created using venv, users should include lines in their SLURM job submission script to load the Python modulefile used for the environment's base and to activate the environment. For example:

#!/bin/bash

#SBATCH -p general
#SBATCH -N 1
#SBATCH --mem 4g
#SBATCH -n 1
#SBATCH -t 2:00:00
#SBATCH --mail-type=end
#SBATCH --mail-user=<onyen>@email.unc.edu

module purge
module load python/3.9.6
source ~/<env_name>/bin/activate
python myscript.py

To submit a job utilizing a virtual environment created using conda, users should include lines in their SLURM job submission script to load the Anaconda modulefile and to activate the conda environment. For example:

#!/bin/bash

#SBATCH -p general
#SBATCH -N 1
#SBATCH --mem 4g
#SBATCH -n 1
#SBATCH -t 2:00:00
#SBATCH --mail-type=end
#SBATCH --mail-user=<onyen>@email.unc.edu

module purge
module load anaconda
conda activate <env_name>
python myscript.py

There have been cases of the wrong distribution of python (i.e. not your conda distribution) being used from within a SLURM script. This may be the case if packages installed in your conda environment are not available to your python script (you are getting a ModuleNotFoundError). To force the python distribution from your conda environment to be used, run conda run -n <env_name> python myscript.py instead of python myscript.py.

Using virtual environments in a Jupyter notebook

To access a previously-created conda environment from within a Jupyter notebook session in Open OnDemand, you must first create a kernel associated with the conda environment you would like to use. This can be done by starting a command-line session on Longleaf and entering the following series of commands:

$ module load anaconda
$ conda activate <env_name>
(<env_name>) $ python -m ipykernel install --user --name=<env_name>

The next time you start a Jupyter notebook session in OnDemand, you should be able to select the kernel associated with your conda environment when creating a new notebook.

Note that the OnDemand Jupyter notebook application is not currently compatible with Python environments created using venv.

 

Last Update 7/15/2024 8:46:16 PM