How to use Python on NERSC systems¶
There are 4 options for using and configuring your Python environment at NERSC. We provide a brief overview here and will explain each option in greater detail below. These options are the same for both Cori and Perlmutter.
- Module only
- Module + conda activate (most popular)
- Use a Shifter container (best practice for 10+ nodes)
- Install your own Python
If you intend to run at large scale (10+ nodes), Shifter is the best option. You can also install your Python installation or conda environment on our faster
/global/common/software filesystem. We provide more discussion about how to achieve good performance by choosing the right filesystems.
Option 1: Module only¶
In this mode, you just
module load python and use it however you like. This is the simplest option but also the least flexible. If you require a package that is not in our default modules this option will not work for you.
Who should use Option 1?
Option 1 is best for users who want to get started quickly and who do not require special libraries or custom packages.
Option 2: Module + conda activate¶
In this mode, you first
module load python and then build and use a conda environment on top of our module. To use this method:
module load python conda activate myenv
To leave your environment
and you will return to the base Python environment.
To create a custom environment using Option 2
module load python conda create --name myenv python=3.8 conda activate myenv conda install <your package>
Who should use Option 2?
This is our most popular option. It is good for anyone who would like to use packages that not avaible in the Python module.
Option 3: Install/Use Python inside a Shifter container¶
We strongly suggest this option for any user who needs to run Python on 10+ nodes. This will result in better performance for your own application, make you less vulnerable to filesystem slowdowns caused by other users, and of course prevent causing filesystem slowdowns for other users. Please see our Python in Shifter documentation and examples.
Who should use Option 3?
Option 3 is suitable for users willing to build their own software stack inside of a container. mpi4py works best at scale in Shifter.
Option 4: Install your own Python¶
You don't have to use any of the Python options we described above- you are free to install your own Python via Miniconda, Anaconda, Intel Python, or a custom collaboration install to have complete control over your stack.
Collaborations, projects, or experiments may wish to install a shareable, managed Python stack to
/global/common/software independent of the NERSC modules. You are welcome to use the Anaconda installer script for this purpose. In fact you may want to consider the more "stripped-down" Miniconda installer as a starting point. That option allows you to start with only the bare essentials and build up. Be sure to select Linux version in either case. For instance:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b \ -p /global/common/software/myproject/env [installation messages] source /global/common/software/myproject/env/bin/activate conda install <only-what-my-project-needs>
You can customize the path with the
-p argument. The installation above would go to
$HOME/miniconda3 without it. You should also consider the
PYTHONSTARTUP environment variable which you may wish to unset altogether. It is mainly relevant to the system Python we advise against using.
Who should use Option 4?
Option 4 is suitable for individuals or collaborations who would like to install, maintain, and control their own Python stack. Users who choose Option 4 should not combine their custom Python installations with our NERSC Python modules.
Using conda, mamba, and pip to install packages and manage environments¶
Overview of conda¶
Anaconda provides a conda cheat sheet you may find helpful.
To find availble packages, you can use the
conda search tool. To install packages, you can use the
conda install command.
conda search numpy conda install numpy
Conda has several default channels that will be used first for package installation. If you want to use another channel beyond the defaults channel, you can, but we suggest that you select your channel carefully. We also suggest that you choose channels as you need them rather than permanently adding them to your
conda config or
.condarc file. For example,
conda install numpy --channel conda-forge is better than
conda config --add channels conda-forge.
Conda will search the default channels first. This is good because it means that MKL-enabled NumPy will be installed which generally performs well on Cori's Intel hardware.
If however you have added other channels to your search path, for example
conda-forge, the packages that
conda-forge will decide to install may not be optimal for NERSC. In this example, you will likely get a version of NumPy that uses OpenBLAS instead of MKL and this can be substantially slower on Cori.
If you find
conda is slow, try
conda tool can sometimes be very slow when it's resolving packages in large and complex environments. You can try mamba instead of
conda by simply replacing
Installing libraries via pip¶
pip at NERSC via our Python modules, but users should be aware of several features of
pip behavior that can cause problems. Anaconda provides some Best practices for using pip with conda. Our suggested use of
pip is inside a conda environment. This makes it very easy to know exactly where packages are installed and also easy to clean them up completely when you are done. We suggest the following:
module load python conda activate myenv pip install numpy
-v: verbose output, useful for debugging and confirming expected behavior.
--force-reinstall: forces a reinstall/rebuild in case the package is already installed.
--no-cache-dir: don't use the local package cache, we want a fresh download of the source code.
--no-binary: we want to build the package from source so don't use existing binaries.
--no-build-isolation: build the package using dependencies from the current environment.
--no-deps: don't install dependent packages, we want to use the ones in the current environment.
See the pip documentation for more information.
pip search path can find incompatible packages
pip install <package>, the pip tool with traverse its search path and may discover an old version of
--no-cache-dir options to ensure a new and compatible package will be installed. This is even more important now that our Cori and Perlmutter systems are sharing filesystems.
Using conda clone¶
Cloning conda environments gives you the ability to copy a preexisting conda environment and modify it as you like. One example of a good use of conda clone is to copy the NERSC machine learning modules like TensorFlow so you can install your own packages. You can find the location of the environment you'd like to clone by using
module show tensorflow, for example.
module load python conda create --name my-tensoflow --clone /usr/common/software/tensorflow/intel-tensorflow/2.2.0-py37 source activate my-tensorflow python -m ipykernel install --user --name my-tensorflow --display-name my-tensorflow conda install <new package>
If you have questions about this, please don't hesitate to submit a ticket.
Moving your conda setup to
For better performance or if you plan to run your application at scale, consider installing your custom environment in your project's directory on
conda create --prefix /global/common/software/myproject/myenv python=3.8 source activate /global/common/software/myproject/myenv conda install numpy scipy astropy
You can also change your default conda location to
/global/common/software. An easy way to do this is to change the settings in your
envs_dirs: - /global/common/software/<your project>/conda pkgs_dirs: - /global/common/software/<your project>/conda channels: - defaults
This will place all of your environments in this directory by default, and you won't have to worry about specifying the full prefix to your environment when installing it or activating it.
We are aware the project directory quotas on
/global/common/software are small. Please open a ticket at
help.nersc.gov if you need more space.
conda init +
We previously supported a different Option 3 in which a user can configure their conda setup via
module load python conda init
We have now deprecated this and do not recommend it for several reasons.
.bashrcfile is shared between Cori and Perlmutter
- No NERSC-provided settings like
- Confusing interactions between
conda initPython setup and python module
To stop using this option, you can run the command
conda init --reverse or simply delete the lines that conda init has added to your .bashrc file.
These may look like:
# >>> conda initialize >>> # !! Contents within this block are managed by 'conda init' !! __conda_setup="$('/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/bin/conda' 'shell.bash' 'hook' 2> /dev/null)" if [ $? -eq 0 ]; then eval "$__conda_setup" else if [ -f "/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/etc/profile.d/conda.sh" ]; then . "/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/etc/profile.d/conda.sh" else export PATH="/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/bin:$PATH" fi fi unset __conda_setup # <<< conda initialize <<<
You can continue to use your existing conda environments with our directions in Option 2. If you have questions about this, please open a ticket at our helpdesk.