How to use Python on NERSC systems¶
There are 5 options for using and configuring your Python environment at NERSC. We provide a brief overview here and will explain each option in greater detail below.
- Module only
- Module + source activate
- Conda init + conda activate
- Install your own Python
- Use a Shifter container (best practice for 10+ nodes)
Small scale: Our data show that about 80 percent of our NERSC Python users are using custom conda environments (Options 2 and 3)- you might find these are a good solution for you, too.
Large scale: While Options 2 and 3 are well-suited for small scale jobs, they do not scale well. If you intend to run at large scale (10+ nodes), Shifter is the best option.
For more discussion about how to achieve good performance by choosing the right filesystems, please see here.
Option 1: Module only¶
In this mode, you just
module load python and use it however you like. This is the simplest option but also the least flexible. If you require a package that is not in our default modules this option will not work for you.
Who should use Option 1?
Option 1 is best for users who want to get started quickly and who do not require special libraries or custom packages.
Option 2: Module + source activate¶
In this mode, you first
module load python and then build and use a conda environment on top of our module. To use this method:
module load python source activate myenv
To leave your environment
and you will return to the base Python environment.
Who should use Option 2?
Option 2 is a good choice for any user who doesn't want a specific version of Python loaded automatically when they log on to Cori. It is also good for users who prefer to use the most recent Python module.
Option 3: Conda init + conda activate¶
In this mode, you are not actually using the Python module. Rather, you will configure your environment one time based on a Python module. This means that your configuration will not have variables like
PYTHONUSERBASE set that help group
pip packages in an organized fashion. Option 3 also doesn't include any safety checks that might prevent you from mixing Python environments. If these things are not an issue for you, you can configure your setup one time via:
module load python conda init
This will add the following to your
# >>> conda initialize >>> # !! Contents within this block are managed by 'conda init' !! __conda_setup="$('/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/bin/conda' 'shell.bash' 'hook' 2> /dev/null)" if [ $? -eq 0 ]; then eval "$__conda_setup" else if [ -f "/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/etc/profile.d/conda.sh" ]; then . "/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/etc/profile.d/conda.sh" else export PATH="/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/bin:$PATH" fi fi unset __conda_setup # <<< conda initialize <<<
After you have configured your environment, when you log on to Cori you should only:
conda activate myenv
To leave your environment:
and you will return to the base Python environment.
What should you do if you decide you don't like Option 3? You can simply delete the lines that
conda init has added to your
.bashrc. file and choose another Python option.
Who should use Option 3?
Option 3 is suitable for any user who would like a particular Python environment loaded by default whenever they access Cori. However, the user must be willing to manually monitor and update their configuration. Users should also be aware that they will need to manage their pip package installation via setting
PYTHONUSERBASE for example. Users who choose Option 3 should not combine their conda-init configured Python environment with our NERSC Python modules.
Option 4: Install your own Python¶
You don't have to use any of the Python options we described above- you are free to install your own Python via Miniconda, Anaconda, Intel Python, or a custom collaboration install to have complete control over your stack.
Collaborations, projects, or experiments may wish to install a shareable, managed Python stack to
/global/common/software independent of the NERSC modules. You are welcome to use the Anaconda installer script for this purpose. In fact you may want to consider the more "stripped-down" Miniconda installer as a starting point. That option allows you to start with only the bare essentials and build up. Be sure to select Linux version in either case. For instance:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b \ -p /global/common/software/myproject/env [installation messages] source /global/common/software/myproject/env/bin/activate conda install <only-what-my-project-needs>
You can customize the path with the
-p argument. The installation above would go to
$HOME/miniconda3 without it. You should also consider the
PYTHONSTARTUP environment variable which you may wish to unset altogether. It is mainly relevant to the system Python we advise against using.
Who should use Option 4?
Option 4 is suitable for individuals or collaborations who would like to install, maintain, and control their own Python stack. Users who choose Option 4a should not combine their custom Python installations with our NERSC Python modules.
Option 5: Install/Use Python inside a Shifter container¶
We strongly suggest this option for any user who needs to run Python on 10+ nodes. This will result in better performance for your own application, make you less vulnerable to filesystem slowdowns caused by other users, and of course prevent causing filesystem slowdowns for other users.
To get started using Docker containers, see here.
To use Docker containers at NERSC via Shifter, see here.
Coming soon: better and more complete examples of using Python in a Shifter container. In the meantime please write to us at
help.nersc.gov if you have questions or need guidance. We are happy to help you get your Python application running in a Shifter container.
Who should use Option 5?
Option 5 is suitable for users willing to build their own software stack inside of a container. Anyone who plans to run mpi4py jobs at scale is strongly encouraged to use Option 5. Please see here for more information.
Creating conda environments¶
Creating custom conda environments is usually quick and easy. If you require a package that is not available in our default module, this is the option you must use.
If you are using Option 2 (source activate):
module load python conda create --name myenv python=3.8 source activate myenv conda install numpy scipy astropy
If you are using Option 3 (conda activate):
conda create --name myenv python=3.8 conda activate myenv conda install numpy scipy astropy
For better performance or if you plan to run your application at scale, consider installing your custom environment in your project's directory on
conda create --prefix /global/common/software/myproject/myenv python=3.8 source activate /global/common/software/myproject/myenv conda install numpy scipy astropy
We are aware the project directory quotas on
/global/common/software are small. Please open a ticket at
help.nersc.gov if you need more space.
Installing libraries via conda channels¶
Conda has several default channels that will be used first for package installation. If you want to use another channel beyond the defaults channel, you can, but we suggest that you select your channel carefully.
Here is an example that demonstrates why your channels matter. If we
conda install numpy
it will search the default channels first. This is good because it means that MKL-enabled NumPy will be installed which generally performs well on Cori's Intel hardware.
If however you have added other channels to your search path, for example
conda-forge, the packages that
conda-forge will decide to install may not be optimal for NERSC. In this example, you will likely get a version of NumPy that uses OpenBLAS instead of MKL and this can be substantially slower on Cori.
Don't permanently add other channels to your conda config, i.e.
conda config --add channels conda-forge
Do this instead:
conda install numpy --channel conda-forge
It's better to append the channel you need with a
-channel conda-forge. This uses
conda-forge only when you ask for it and not all the time.
Installing libraries via pip¶
Pip is available under Anaconda Python. If you create a conda environment but you are unable to find a conda build of whatever package (or version of that package) you want to install, then pip is one viable alternative. However, pip users at NERSC should be aware of the following:
- Users of the pip command may want to use the "--user" flag for per-user site-package installation following the PEP370 standard. On Linux systems this defaults to
$HOME/.local, and packages can be installed to this path with "pip install --user package_name." This can be overridden by defining the
- To prevent per-user site-package installations from conflicting across machines and module versions, at NERSC we have configured our Python modules so that
PYTHONUSERBASEis set to
$HOME/.local/$NERSC_HOST/versionwhere "version" corresponds to the version of the Python module loaded. Note that anyone using Option 3 will have to configure this themselves.
Mixing pip and conda: an example¶
We have observed that users often don't realize that the per-user site-package directories are included in the search path from all their conda environments created with the same module. What does this mean? We'll demonstrate with an example. If you have done the following:
module load python pip install numpy --user
Any conda environment you have created based on this Python module will have this pip-installed NumPy in its search path.
It can be easy to forget you've done "pip install --user" and then create a new conda environment and be confused by how it works (or doesn't).
If you're using a conda environment anyway, think about whether you really want a pip-installed package to be accessible to multiple conda environments. If you don't, just drop the "--user" part and install it into your conda environment:
module load python source activate myenv pip install numpy