Skip to content

Python at NERSC FAQ

Have a question about using Python on NERSC's supercomputers that you think others might have already asked? This FAQ may be useful to you. If you have suggestions, updates, or follow-up questions, open a ticket with NERSC Consulting.

Should I use Python 2 or Python 3?

Python 3! Python 2 reached its end of life on Jan 1, 2020. Python 2 will remain on Cori for now, but will not be available on Perlmutter. For more information, please see this page.

How can I checkpoint my code?

Checkpointing your code can make your workflow more robust to:

  • System issues. If your job crashes because of a system issue, you will be able to restart the checkpointed calculation in a resubmitted job later and it can pick up where it left off.
  • User error. The most common use case here is that the calculation takes longer than the user expected when the job was submitted, and doesn't finish before the time limit.
  • Preemption. Some HPC systems offer preemptable queues, where jobs can be run with discount charging because they may be interrupted for higher priority jobs. If your code can be preempted because it can checkpoint, you can take advantage of discount charging or submit shorter jobs. The net effect may be actually faster throughput for your workflow.

This example repo demonstrates one simple way to add graceful error handling and checkpointing to a Python code. Note, mpi4py jobs must be run with srun on Cori. For example:

srun -n 2 ./main.py
is suitable for checkpointing. For checkpointing to work, other Python jobs must be run with exec:
exec ./main.py
so that the SIGINT signal will be forwarded. (Bash will not do this.) The InterruptHandler class in this example demonstrates how to catch SIGINT, checkpoint your work, and shut down if necessary.

Can NERSC install [some Python package] for me?

Users sometimes contact NERSC to ask if a Python package could be installed with NERSC-maintained Python (i.e., Python installed by NERSC staff at /usr/common/software and available via module load). We consider three broad guidelines in making a decision:

  • General utility. It makes sense for NERSC to focus support on packages that are broadly useful to the most users. At the same time we are happy to help individual users install more specialized packages for their own use. (See below.)
  • Maintenance activity. We prefer to install packages that are actively maintained by community-engaged developers. This way, if we run into problems we can engage with developers to arrive at a solution quickly. Abandoned projects may also pose a security risk and we discourage users from such packages altogether.
  • Ease of installation. Python packages are usually straightforward to install, but in cases where the build system is effectively broken and we cannot debug the problem, we may need to wait quite some time for the developer to address the issue.

Actively maintained, easy to install packages that a large number of users will find useful are the most likely candidates for NERSC support. Packages that only a single user or a small number of users need are likely to be met with a suggestion that the requester manage installation themselves. Abandoned packages will not be installed but we may suggest alternatives. Refer to the software policy page for more information.

How can my collaboration install and share Python?

Collaborations, projects, or experiments may wish to install a shareable, managed Python stack to /global/common/software independent of the NERSC modules. You are welcome to use the Anaconda installer script for this purpose. In fact you may want to consider the more "stripped-down" Miniconda installer as a starting point. That option allows you to start with only the bare essentials and build up. Be sure to select Linux version in either case. For instance:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b \
    -p /global/common/software/myproject/env
[installation messages]
source /global/common/software/myproject/env/bin/activate
conda install <only-what-my-project-needs>

You can customize the path with the -p argument. Ihe installation above would go to $HOME/miniconda3 without it.

Attention

When using your own Anaconda/Miniconda installation be sure not to load any NERSC-provided Python modules. Also take care to consider the PYTHONSTARTUP environment variable which you may wish to unset altogether. It is mainly relevant to the system Python we advise against using.

Note that to activate the root environment, technically you should use the source shell command. Setting PATH to the root environment bin directory works but the source/conda tool does more than that.

Can I use virtualenv on Cori?

The virtualenv tool is not compatible with the conda tool used for maintaining Anaconda Python. But this is not necessarily bad news as conda is an excellent replacement for virtualenv and addresses many of its shortcomings. And of course, there is nothing preventing you from doing a from-source installation of Python of your own, and then using virtualenv if you prefer.

What if I can't find a conda package I need?

Conda package builds are provided through namespaces called channels. At NERSC, we try to stick to packages from the defaults channel as much as possible. Doing so helps us maintain a coherent Anaconda installation to guarantee that all the installed packages work together (a release). But sometimes the defaults channel doesn't provide a version of a package we need. In these cases we tend to use pip (see also) after installing dependencies using the conda tool where possible.

If you want to use another channel beyond the defaults channel, you can, but we suggest that you select your channel carefully. We've found that there isn't much guidance in terms of which channels are actively maintained or exactly who is managing them. Sometimes it's obvious from the name. Other times a developer community creates a channel of its own but you have to reach out to developers to find the right one.

To search for a package beyond the defaults channel, use the Anaconda client tool. For example, to see channels providing AstroPy:

module load python
anaconda search -t conda astropy

Be sure to look for builds for the "linux-64" platform.

Can I use "pip" to install my own packages?

Yes. Pip is available under Anaconda Python. If you create a conda environment but you are unable to find a conda build of whatever package (or version of that package) you want to install, then pip is one viable alternative. The other alternative is to try a different channel (see also).

Users of the pip command may want to use the "--user" flag for per-user site-package installation following the PEP370 standard. On Linux systems this defaults to $HOME/.local, and packages can be installed to this path with "pip install --user package_name." This can be overridden by defining the PYTHONUSERBASE environment variable.

Note

To prevent per-user site-package installations from conflicting across machines and module versions, at NERSC we have configured our Python modules so that PYTHONUSERBASE is set to $HOME/.local/$NERSC_HOST/version where "version" corresponds to the version of the Python module loaded.

Note

We have observed that users often don't realize that the per-user site-package directories are included in the search path from all their conda environments created with the same module. It can be easy to forget you've done "pip install --user" and then create a new conda environment and be confused by how it works (or doesn't). If you're using a conda environment anyway, think about whether you really want a pip-installed package to be accessible to multiple conda environments. If you don't, just drop the "--user" part and install it into your conda environment.

Can I install my own Anaconda Python "from scratch?"

Yes. One reason you might consider this is that you want to install Anaconda Python on /global/common/software or in a Shifter image to improve launch-time performance for large-scale applications. Or you might want more complete control over what versions of packages are installed and don't want to worry about whether NERSC will upgrade packages to versions that break backwards compatibility you depend on. See here for more information on how you can do this.

Can I use mpi4py from my Anaconda environment?

Yes, you can use mpi4py in your own custom Conda environment. See this page for detailed instructions.

How do I use the Intel Distribution for Python at NERSC?

Create a conda environment for your Intel Distribution for Python installation:

module load python
conda create -n idp -c intel intelpython3_core python=3
source activate idp

Should I use Anaconda or Intel Distribution for Python?

Intel Math Kernel Library (MKL), Data Analytics Acceleration Library (DAAL), Thread Building Blocks (TBB), and Integrated Performance Primitives (IPP) are available through Intel Community Licensing. This enabled both Continuum Analytics and Intel to provide access to Intel's performance libraries through Python for free starting in late 2015 and early 2016. In terms of performance the two distributions are about the same.

Python includes source code licensed under GPL and this constrains the Intel Python distribution somewhat. Most importantly, interactive use of Python or IPython under the Intel Distribution is provided without GNU readline. There doesn't yet appear to be a viable non-GPL alternative at this point in time. We suggest that for the most part users may find that Anaconda Python provides the best of both worlds, but that Anaconda Python may lag slightly behind the Intel Distribution in terms of performance for short periods. The two companies have a very strong relationship and we think this greatly benefits Python users in the long term.

How can I profile my Python code's performance?

Check out this page dedicated to both generic Python profiling and Python profiling on Cori here.