This guide presents an overview of managing Python packages and environments on the Longleaf and Dogwood clusters.
Table of Contents
On Longleaf and Dogwood, there are several pre-installed versions of Python available as modulefiles. To view the available versions of Python, run the following command from within a command-line session on Longleaf or Dogwood:
$ module avail python ------------------------------------------ /nas/longleaf/apps/lmod/modulefiles/Core ------------------------------------------- python/2.4.6 python/2.7.13 python/3.6.6 python/3.7.14 python/3.9.6 python/2.7.12 python/3.5.1 (D) python/3.7.9 python/3.8.8 Where: D: Default Module Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
To load a specific version of Python to your environment, use the
module load command. To identify the location of the currently-loaded python interpreter, you may use the
which command. For example:
$ module load python/3.9.6 $ which python /nas/longleaf/rhel8/apps/python/3.9.6/bin/python
The system-wide versions of Python often include common scientific packages such as
scipy, and are sufficient for basic computations; however, most research projects will require users to install additional packages themselves using pip or conda.
There there are numerous open-source packages available for download through the Python Package Index (PyPi) repository. Pip is a package-management system that installs packages from the PyPi repository through a command-line interface. Using the
pandas package as an example, users can install the latest release of the package to their home directory from within a command-line session on Longleaf using the following commands:
$ module load python/3.9.6 $ python -m pip install --user pandas
Because users only have write access to specific locations on Longleaf such as their home directories, it is necessary to include the
--user flag when doing pip installs. In the above example,
pandas will be installed to the user-specific site-packages directory for Python 3.9 (
Pip can be used to install specific package versions using the
$ python -m pip install --user pandas==2.0.2
Similarly, minimum package version requirements can be specified using the
$ python -m pip install --user pandas>=2.0.2
For more information on installing packages using pip, please refer to the pip user guide.
In order to manage dependency requirements for multiple projects, we strongly recommend that users install their local packages into isolated "virtual" Python environments. A Python virtual environment is a directory structure that contains all the necessary executables and packages needed to build and run a Python-based project. Using separate virtual environments for each project allows users to install and upgrade Python packages as needed without having to worry about creating dependency conflicts in their other projects. In addition, using virtual environments helps to improve scientific reproducibility by allowing others to more easily replicate the computational environment in which an analysis was conducted.
Venv is a lightweight tool for creating virtual environments that is included as part of the Python 3 Standard Library. To create a new virtual environment using venv, first load one of the Python 3 modulefiles from within a command-line session on Longleaf, then invoke the venv command followed by the path to the directory where you would like the environment to be created.
$ module purge $ module load python/3.9.6 $ python -m venv ~/<env_name>
In the above example, a new virtual environment with Python 3.9.6 is created in the user's home directory. This environment is created on top of the existing Python installation associated with the
python/3.9.6 modulefile, which serves as a "base" for the virtual environment. Users may customize the version of Python used in the virtual environment by loading specific Python modulefiles during the environment creation process. In addition, users can create environments in different locations by modifying the installation path passed to the venv command.
$ module load python/3.5.1 $ python -m venv /work/users/<o>/<n>/<onyen>/<env_name>
In the above example, a new virtual environment with Python 3.5.1 is created in the user's /work directory. Before you can begin installing or using packages in your virtual environment, you will need to activate it:
$ source <path_to_env>/<env_name>/bin/activate
Note that the Python modulefile used as the environment's base must be loaded before the environment is activated for python code to work within the environment. Once an environment is activated, its name will appear to the left of the terminal prompt in parentheses (i.e.,
(<env_name>) $). You can also verify that you are in an active virtual environment by checking the location of your Python interpreter using the
which python command, which should return the path to an interpreter located inside the
<env_name> folder. Once activated, you may install packages to your environment using pip:
(<env_name>) $ python -m pip install pandas==2.0.2
When used from within an active virtual environment, pip will install Python packages to the environment's site-packages directory without needing to be told to do so explicitly. These packages will be available to you next time you activate the virtual environment. To uninstall a package, use
pip uninstall <packagename. To obtain a list of packages installed to the currently-active environment, users may use the
pip freeze command. This command is often used to create a requirements file listing out all of a project's dependencies.
(<env_name>) $ python -m pip freeze > requirements.txt
Requirements files are useful for reproducing the computational environment in which a scientific analysis was conducted. Users can use requirements files to install the dependencies needed to conduct an analysis on a new computer or in a new virtual environment using the following command:
(<new_env>) $ python -m pip install -r requirements.txt
To exit the currently-active virtual environment, simply run the
deactivate command. Once deactivated, the name of the environment will no longer appear to the left of your terminal prompt.
(<env_name>) $ deactivate $
To delete a venv, deactivate the environment and then delete the directory containing the environment.
Conda is an environment and package manager that allows users to create virtual environments and install a variety of packages from the Anaconda open-source repository. Unlike virtual environments created using venv, conda environments are almost entirely self-contained and do not require one of the system-wide Python modulefiles for use as a base. As such, users can create conda environments with almost any version of Python and are not limited to the versions available as modulefiles. In addition, conda environments can be used to manage packages and dependencies for programming languages other than Python such as R.
To use conda, first load one of the available Anaconda modulefiles from within a command-line session on Longleaf or Dogwood. When loading the Anaconda modulefile, please ensure that there aren't other Python modulefiles loaded at the same time as this can create conflicts. The general syntax for creating a conda environment with a specific version of Python and activating that environment is as follows:
$ module purge $ module load anaconda $ conda create --name=<env_name> python=3.9 $ conda activate <env_name>
In the above example, an environment using Python 3.9 is created in the user's home directory under
~/.conda/envs/<env_name>. Alternatively, users can create an environment in a custom location by using the
--prefix option to specify the installation directory.
$ conda create --prefix=/work/users/<o>/<n>/<onyen>/<env_name> python=3.9
Note than when activating a conda environment created using the
--prefix option, the path to the directory containing the environment must be passed as an argument to the
conda activate command:
$ conda activate /work/users/<o>/<n>/<onyen>/<env_name>
After activating their conda environment, users can install packages from several online channels, which will be available to you each time you activate your conda environment. Using the
pandas package as an example, users can use the following syntax to install packages from the Anaconda open-source repository:
(<env_name>) $ conda install -c anaconda pandas=2.0.2
In the above example, pandas version 2.0.2 is installed to the
<env_name> environment. Many packages that are not included in the main Anaconda repository can be found on conda-forge, a community-run repository with a wide variety of Python packages. Using
geopandas as an example, users can use the following syntax to install packages from conda-forge:
(<env_name>) $ conda install -c conda-forge geopandas
If you need to install a package that cannot be downloaded from the Anaconda or conda-forge repositories, you can use pip as an alternative method for installing the package.
(<env_name>) $ python -m pip install geopandas
In the above example, the
geopandas package is downloaded from PyPi and installed to the site-packages directory of the currently-active conda environment (
<env_name>/lib/python3.9/site-packages). Because conda is limited in its ability to control packages installed by pip, issues can arise when conda and pip are used together to create an environment. For this reason, you should avoid installing or updating packages using conda after doing pip installs within a conda environment.
You can uninstall a package from your conda environment using
conda remove <package-name>, which will remove the specified package and any package that depends on the specified package. Users can view a list of Python packages installed to the currently-active environment using the
conda list command. Users may exit the currently-active conda environment using the
conda deactivate command. To delete a conda environment and all installed packages, deactivate your environment using
conda deactivate and then run
conda remove --name <env_name> --all. For more information on managing Python environments using conda, please refer to the conda user guide and cheat sheet.
To submit a job utilizing a virtual environment created using venv, users should include lines in their SLURM job submission script to load the Python modulefile used for the environment's base and to activate the environment. For example:
#!/bin/bash #SBATCH -p general #SBATCH -N 1 #SBATCH --mem 4g #SBATCH -n 1 #SBATCH -t 2:00:00 #SBATCH --mail-type=end #SBATCH --mail-user=<onyen>@email.unc.edu module purge module load python/3.9.6 source ~/<env_name>/bin/activate python myscript.py
To submit a job utilizing a virtual environment created using conda, users should include lines in their SLURM job submission script to load the Anaconda modulefile and to activate the conda environment. For example:
#!/bin/bash #SBATCH -p general #SBATCH -N 1 #SBATCH --mem 4g #SBATCH -n 1 #SBATCH -t 2:00:00 #SBATCH --mail-type=end #SBATCH --mail-user=<onyen>@email.unc.edu module purge module load anaconda conda activate <env_name> python myscript.py
To access a previously-created conda environment from within a Jupyter notebook session in Open OnDemand, you must first create a kernel associated with the conda environment you would like to use. This can be done by starting a command-line session on Longleaf and entering the following series of commands:
$ module load anaconda $ conda activate <env_name> (<env_name>) $ python -m ipykernel install --user --name=<env_name>
The next time you start a Jupyter notebook session in OnDemand, you should be able to select the kernel associated with your conda environment when creating a new notebook.
Last Update 11/29/2023 5:41:26 PM