Skip to content

Using Perlmutter

Perlmutter is not a production resource

Perlmutter is not a production resource and usage is not charged against your allocation of time. While we will attempt to make the system available to users as much as possible, it is subject to unannounced and unexpected outages, reconfigurations, and periods of restricted access. Please visit the timeline page for more information about changes we've made in our recent upgrades.

Current Known Issues

Known Issues on Perlmutter

Access

Perlmutter is not yet available for general user access.

If you have a GPU-ready code and would like access to Perlmutter, please fill out the Perlmutter Request for Access Form and your request will be evaluated.

Connecting to Perlmutter

You can connect directly to Perlmutter with

ssh perlmutter-p1.nersc.gov

or

ssh saul-p1.nersc.gov

You can also connect to Perlmutter from Cori or a DTN and then connect to Perlmutter with ssh perlmutter.

Connecting to Perlmutter with sshproxy

If you have an ssh key generated by sshproxy, you can configure your local computer's ~/.ssh/config file as suggested in the webpage section SSH Configuration File Options.

Transferring Data to / from Perlmutter Scratch

Perlmutter scratch is only accessible from Perlmutter login or compute nodes.

For small transfers you can use scp on a Perlmutter login node.

For larger transfers you can set up a Globus Personal Server on a Perlmutter login node.

Larger datasets could also be staged on the Community File System (which is available on Perlmutter) either with Globus, or a cp, or rsync on a Data Transfer Node. Once the data is on the Community File System, you can use cp, or rsync from a Perlmutter login node to copy the data to Perlmutter scratch.

Caveats on the system

Last Updated: Dec 10th, 2021.

  • Static compilation isn't officially supported by NERSC, but we have outlined some instructions under the static compilation section in the compiler wrappers documentation page.
  • collabsu is not available. Please create a direct login with sshproxy to login into Perlmutter or switch to a collaboration account on Cori and then login to Perlmutter.
  • MPI/mpi4py users may notice a mlx5 error that stems from spawning forks within an MPI rank, which is considered undefined/unsupported behavior.
  • PrgEnv-gnu users if using a cuda enabled code (gcc and nvcc) one may load cpe-cuda module in order to get compatible version of gcc for the respective cudatoolkit installation. Please see our gcc compatibility section for additional details.
  • Users may notice MKL-based CPU code runs more slowly. Please try module load fast-mkl-amd.

Preparing for Perlmutter

Please check the Transitioning Applications to Perlmutter webpage for a wealth of useful information on how to transition your applications for Perlmutter.

Compiling/Building Software

You can find information below on how to compile your code on Perlmutter:

Programming Environment & Cray Wrappers

There are several HPE/Cray-provided programming environments available on Perlmutter, with varying levels of support for GPU (A100s) code generation: PrgEnv-(nvidia, cray, gnu, aocc). Each environment provides compilers for C, C++, and Fortran. To compile your code in any environment, you must always use the Cray Wrappers (cc, CC, ftn) which combine the native compilers under that environment, Cray MPI and various other libraries that are needed to successfully run your application on Perlmutter.

Accessing Older Programming Environments

Generally we recommend that you use the most recent programming environment installed on Perlmutter. However, sometimes it is convenient to have access to previous programming environments to check things like compile options and libraries, etc. You can use module load cpe/YY.XX to load the previous programming environment from year YY and month XX. We will remove cpe modules for environments that no longer work on our system due to changes in underlying dependencies like network libraries.

Load correct cpe-cuda module for gcc and nvcc compatibility

In the PrgEnv-gnu if you wish to use the cpe-cuda module to resolve the gcc and nvcc compatibility issue please load the cpe-cuda version to match your cpe module.

E.g., If cpe/YY.XX is loaded please use module load cpe-cuda/YY.XX replacing the values of Y and X.

Please keep in mind that these cpe modules are offered for convenience sake. If you require reproducibility across environments we encourage you to investigate container-based options like Shifter.

Compiling GPU applications on the system

On Perlmutter, users have access to the HPE/Cray-provided cudatoolkit which includes GPU-accelerated libraries, profiling tools (nsight compute & systems), a C/C++ compiler, and a runtime library to build and deploy your application. This module is not loaded by default, users have the choice to load a specific version of the cudatoolkit module that matches the nvcc (cuda compiler) version that the application needs.

Details about cudatoolkit modules

There are 3 cudatoolkit modules per nvhpc (NVIDIA HPC SDK). This will give users access to a specific CUDA version and its associated libraries for that version of nvhpc. For example, the cudatoolkit/21.9_11.4 module for sdk nvhpc/21.9 will give you access to CUDA v11.4.1 and the CUDA APIs supported in that release.

If you are looking for a specific CUDA version that does not match the cudatoolkit version (like 11.3), please consider trying the CUDA Compatibility Libraries which are availble in all cudatoolkit modules. These are meant to make later CUDA versions backwards compatible with earlier versions. For example, you could load cudatoolkit/21.9_11.4 which should enable you to run a code using CUDA 11.3.

Note

You can only have one cudatoolkit module at a time.

A cudatoolkit module must be loaded to compile any GPU code on the system for any programming environment.

GPU-aware MPI

HPE/Cray MPI (CrayMPICH) is a CUDA-aware MPI Implementation, which allows the programmers to use pointers to GPU device memory in MPI buffers. See the CUDA-Aware MPI section for an example code.

You must set MPICH_GPU_SUPPORT_ENABLED to use CUDA-aware MPI

To use CUDA-aware MPI at runtime you must set export MPICH_GPU_SUPPORT_ENABLED=1 in your batch script or interactive node. Without this setting, you may get a segfault.

Building your application with CUDA-aware MPI

In addition to linking your application with HPE/Cray MPI wrappers (cc/ CC) you must also link your application with the HPE/Cray GPU Transport Layer (GTL) library which enables the use of CUDA-Aware MPI. The accelerator target needs to be set to nvidia-80 during your compilation step to achieve this; ways to enable:

  • module load craype-accel-nvidia80 (one of the cudatoolkit modules must be loaded before loading this module), or,
  • set environment variable: export CRAY_ACCEL_TARGET=nvidia80, or,
  • pass the compiler flag -target-accel=nvidia80

Failing to do one of the above steps, your application will not be able to use CUDA-Aware MPI/ GPU Direct RDMA and at runtime you will see errors like:

MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked

Known issues with CUDA-aware MPI

Bug in Cray MPICH may require GPU binding for jobs with many MPI ranks

Due to an outstanding bug with our vendor, users with many MPI ranks may also require GPU binding. This is because the MPI ranks are incorrectly allocating GPU memory, and too many MPI ranks that allocate this memory will cause the program to segfault (this segfault might happen during execution, or before the first statement is executed, and may happen only when multiple nodes are used). One workaround is to use gpu-binding to evenly spread the allocated memory. Here is an example of using gpu-binding in a 4 node job:

srun --ntasks=32 --ntasks-per-node=8 -G 4 --gpu-bind=single:2 python -m mpi4py.bench helloworld
Even with GPU binding, users may find that the number of MPI ranks they can use within a job are limited. Note that this also impacts CPU-only code that is using CUDA-aware MPI. We expect a fix for this problem soon.

We currently recommend using either PrgEnv-nvidia or PrgEnv-gnu to compile all your applications based on the current A100 support available through each of these environments.

PrgEnv-nvidia

Under the program environment nvidia, host compilers that are available for C, C++ and Fortran applications are nvc, nvc++ and nvfortran. You can see the versions of each compiler by running:

C Compiler:

cc --version 

nvc 21.9-0 64-bit target on x86-64 Linux -tp zen-64 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

C++ Compiler:

CC --version 

nvc++ 21.9-0 64-bit target on x86-64 Linux -tp zen-64 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Fortran Compiler:

ftn --version 

nvfortran 21.9-0 64-bit target on x86-64 Linux -tp zen-64 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Note

As you can see above the Cray wrappers (cc, CC, ftn) in the PrgEnv-nvidia, points to the relative host compilers nvc, nvc++ and nvfortran. You must use these wrappers to compile your application.

Compiling CUDA code with Cray wrappers

The host compilers nvc / nvc++ (accessible through the cc/ CC wrapper) in NVIDIA SDK has CUDA opt-in support. To compile a single source C / C++ code (host & device code in the same source file) with the Cray wrappers you must add the -cuda flag to their compilation step which notifies the nvc/ nvc++ compiler to accept CUDA runtime APIs. Omitting the -cuda flag will result in your application compiling without any of the CUDA API calls, and will generate an executable with undefined behavior. Failing to do so you will also notice warnings like:

nvc-Warning-The -gpu option has no effect unless a language-specific option to enable GPU code generation is used (e.g.: -acc, -mp=gpu, -stdpar, -cuda)
OpenMP & OpenACC GPU-Offload

To compile an OpenMP application one must pass the -mp=gpu flag in the compilation step. For additional details on OpenMP support on the system please see the OpenMP section in our Perlmutter readiness page.

To compile an OpenACC application one must pass the -acc flag in the compilation step. For additional details on OpenACC support on the system please see the OpenACC section in our Perlmutter readiness page.

Enabling offloading support with these programming models. You must set the accelerator target to nvidia80 to allow code generation for A100s. Ways to enable:

  • module load craype-accel-nvidia80 (one of the cudatoolkit modules must be loaded before loading this module), or,
  • set environment variable: export CRAY_ACCEL_TARGET=nvidia80, or,
  • pass the compiler flag -target-accel=nvidia80

PrgEnv-gnu

Under the program environment gnu host compilers that are available for C, C++ and Fortran applications are gcc, g++ and gfortran. Switching to the PrgEnv-gnu will reload certain modules to match the Cray MPI installations and other libraries.

module load PrgEnv-gnu

Lmod is automatically replacing "nvidia/21.9" with "gcc/11.2.0".

Lmod is automatically replacing "PrgEnv-nvidia/8.2.0" with "PrgEnv-gnu/8.2.0".

Due to MODULEPATH changes, the following have been reloaded:
1) cray-mpich/8.1.11

Similar to the PrgEnv-nvidia, compiler wrappers (cc, CC, ftn) will now point to the corresponding host compilers in PrgEnv-gnu:

C Compiler:

cc --version

gcc (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

C++ Compiler:

CC --version

g++ (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Fortran Compiler:

ftn --version

GNU Fortran (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
GCC compatibility with the nvcc compiler

When using the PrgEnv-gnu environment in conjunction with the cudatoolkit module (i.e., if compiling any application for both host and device side), you must note that not every version of gcc is compatible with every version of nvcc. Current Perlmutter default cudatoolkit version cudatoolkit/21.9_11.4 (note: nvcc version for this release is v11.4.1) supports GCC 11.x (see document outlining supported host compilers for each nvcc installation).

You may use the cpe-cuda module available on the system to automatically downgrade the gcc version to gcc/9.3.0 or manually load the gcc/10.3.0 if you wish to use a different cudatoolkit (e.g., cudatoolkit/21.9_11.0 will only support upto GCC 10.x, and you must downgrade the gcc version after loading the cudatoolkit module).

If using the cpe-cuda module, it must be loaded after loading the PrgEnv-gnu:

    module load PrgEnv-gnu
    module load cudatoolkit
    module load cpe-cuda
OpenMP & OpenACC GPU-Offload

We do not recommend using OpenMP / OpenACC GPU-Offloading with PrgEnv-gnu.

Additional useful resources

Running Jobs

Perlmutter uses Slurm for batch job scheduling. During Allocation Year 2021 jobs run on Perlmutter will be free of charge.

Tip

To run a job on Perlmutter GPU nodes, you must submit the job using a project GPU allocation account name, which ends in _g (e.g., m9999_g). An account name without the trailing _g is for charging CPU jobs on Cori and Phase 2 CPU-only nodes.

Below you can find general information on how to submit jobs using Slurm and monitor jobs, etc.: