How to use Python on NERSC systems¶
There are 4 options for using and configuring your Python environment at NERSC. We provide a brief overview here and will explain each option in greater detail below. These options are the same for both Cori and Perlmutter.
- Module only
- Module + conda activate (most popular)
- Use a Shifter container (best practice for 10+ nodes)
- Install your own Python
If you intend to run at large scale (10+ nodes), Shifter is the best option. You can also install your Python installation or conda environment on our faster
/global/common/software filesystem. We provide more discussion about how to achieve good performance by choosing the right filesystems.
Option 1: Module only¶
In this mode, you just
module load python and use it however you like. This is the simplest option but also the least flexible. If you require a package that is not in our default modules this option will not work for you.
Who should use Option 1?
Option 1 is best for users who want to get started quickly and who do not require special libraries or custom packages.
Option 2: Module + conda activate¶
In this mode, you first
module load python and then build and use a conda environment on top of our module. To use this method:
module load python conda activate myenv
To leave your environment
and you will return to the base Python environment.
To create a custom environment using Option 2
module load python conda create --name myenv python=3.8 conda activate myenv conda install <your package>
Who should use Option 2?
This is our most popular option. It is good for anyone who would like to use packages that not avaible in the Python module.
Option 3: Install/Use Python inside a Shifter container¶
We strongly suggest this option for any user who needs to run Python on 10+ nodes. This will result in better performance for your own application, make you less vulnerable to filesystem slowdowns caused by other users, and of course prevent causing filesystem slowdowns for other users. Please see our Python in Shifter documentation and examples.
Who should use Option 3?
Option 3 is suitable for users willing to build their own software stack inside of a container. mpi4py works best at scale in Shifter.
Option 4: Install your own Python¶
You don't have to use any of the Python options we described above- you are free to install your own Python via Miniconda, Anaconda, Intel Python, or a custom collaboration install to have complete control over your stack.
Collaborations, projects, or experiments may wish to install a shareable, managed Python stack to
/global/common/software independent of the NERSC modules. You are welcome to use the Anaconda installer script for this purpose. In fact you may want to consider the more "stripped-down" Miniconda installer as a starting point. That option allows you to start with only the bare essentials and build up. Be sure to select Linux version in either case. For instance:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b \ -p /global/common/software/myproject/env [installation messages] source /global/common/software/myproject/env/bin/activate conda install <only-what-my-project-needs>
You can customize the path with the
-p argument. The installation above would go to
$HOME/miniconda3 without it. You should also consider the
PYTHONSTARTUP environment variable which you may wish to unset altogether. It is mainly relevant to the system Python we advise against using.
Who should use Option 4?
Option 4 is suitable for individuals or collaborations who would like to install, maintain, and control their own Python stack. Users who choose Option 4 should not combine their custom Python installations with our NERSC Python modules.
Using conda, mamba, and pip to install packages and manage environments¶
Overview of conda¶
Anaconda provides a conda cheat sheet you may find helpful.
To find availble packages, you can use the
conda search tool. To install packages, you can use the
conda install command.
conda search numpy conda install numpy
Conda has several default channels that will be used first for package installation. If you want to use another channel beyond the defaults channel, you can, but we suggest that you select your channel carefully. We also suggest that you choose channels as you need them rather than permanently adding them to your
conda config or
.condarc file. For example,
conda install numpy --channel conda-forge is better than
conda config --add channels conda-forge.
The installed package and/or its dependencies may vary depending on the the conda channel it is installed from. For example, installing
numpy from the
defaults channel will install MKL BLAS backend while installing
numpy from the
conda-forge channel will install an OpenBLAS backend.
Installing numpy from conda-forge with MKL
conda-forge with an MKL BLAS backend, use:
conda install -c conda-forge numpy "libblas=*=*mkl"
conda-forge, see this section of the
conda-forgechannel knowledge base.
In some cases, you may need to specify more than one conda channel to satisfy a packages dependency requirements. It may be important to consider the order in which channels are specified in cases where a package or its dependency are provided by more than one of the channels. For more details, see the Managing Channels page of the conda documentation.
If you find
conda is slow, try
conda tool can sometimes be very slow when it's resolving packages in large and complex environments. You can try mamba instead of
conda by simply replacing
Installing libraries via pip¶
pip at NERSC via our Python modules, but users should be aware of several features of
pip behavior that can cause problems. Anaconda provides some Best practices for using pip with conda. Our suggested use of
pip is inside a conda environment. This makes it very easy to know exactly where packages are installed and also easy to clean them up completely when you are done. We suggest the following:
module load python conda activate myenv pip install numpy
-v: verbose output, useful for debugging and confirming expected behavior.
--force-reinstall: forces a reinstall/rebuild in case the package is already installed.
--no-cache-dir: don't use the local package cache, we want a fresh download of the source code.
--no-binary: we want to build the package from source so don't use existing binaries.
--no-build-isolation: build the package using dependencies from the current environment.
--no-deps: don't install dependent packages, we want to use the ones in the current environment.
See the pip documentation for more information.
pip search path can find incompatible packages
pip install <package>, the pip tool with traverse its search path and may discover an old version of
--no-cache-dir options to ensure a new and compatible package will be installed.
Using conda clone¶
Cloning conda environments gives you the ability to copy a preexisting conda environment and modify it as you like. One example of a good use of conda clone is to copy the NERSC machine learning modules like TensorFlow so you can install your own packages. You can find the location of the environment you'd like to clone by using
module show tensorflow, for example.
module load python conda create --name my-tensoflow --clone /global/common/software/nersc/pm-2022q4/sw/tensorflow/2.9.0 source activate my-tensorflow python -m ipykernel install --user --name my-tensorflow --display-name my-tensorflow conda install <new package>
If you have questions about this, please don't hesitate to submit a ticket.
Moving your conda setup to
For better performance or if you plan to run your application at scale, consider installing your custom environment in your project's directory on
conda create --prefix /global/common/software/myproject/myenv python=3.8 source activate /global/common/software/myproject/myenv conda install numpy scipy astropy
You can also change your default conda location to
/global/common/software. An easy way to do this is to change the settings in your
envs_dirs: - /global/common/software/<your project>/conda pkgs_dirs: - /global/common/software/<your project>/conda channels: - defaults
This will place all of your environments in this directory by default, and you won't have to worry about specifying the full prefix to your environment when installing it or activating it.
We are aware the project directory quotas on
/global/common/software are small. Please open a ticket at
help.nersc.gov if you need more space.