Cori Large Memory Nodes (aka
Cori has a set of 20 nodes, each with 2 TB of memory and a 3.0 GHz AMD EPYC 7302 (Rome) processor. The nodes are available to high-priority scientific or technical campaigns that have a special need for this hardware. The initial focus is on supporting COVID-19 related research and preparing for the Perlmutter system (which will have a similar AMD processor).
As stated above, there are twenty large memory nodes:
cmem20. Each of these nodes contains
- Two sockets, each populated with one 16-core AMD EPYC 7302 (Rome) processor running at 3.0 GHz
- Theoretical double-precision peak speed of 48 Gflops per core and 1.536 Tflops per node
- 2 TB of RAM
- 3 TB of NVMe SSD local scratch disk, mounted as
AMD EPYC processors use a multi-chip module (MCM) design where separate dies are provided for CPU and I/O components for easier scalability. The CPU dies are called CCDs (Core Complex Dies) and the IO dies are called IODs.
An AMD Zen2 core in the Rome processor can support Simultaneous Multithreading (SMT), allowing 2 execution threads (aka hardware threads) to execute simultaneously per core. Each core has its own 32-KB L1 data and 512-KB L2 caches.
Four cores share a single 16-MB L3 cache, and they are grouped as a modular unit called Core-Complex (CCX). For this Rome processor, only 2 cores are active, and, therefore, the L3 cache is actually shared by the two.
A CCD contains two CCXs, as depicted in the diagram below.
The EPYC 7302 processor has four CCDs and one IOD per socket, as shown below. All dies interconnect with each other via AMD's Infinity Fabric, sometimes referred to as the Global Memory Interconnect (GMI).
The CCDs connect to memory, I/O, and each other through the IOD. A Rome processor supports 8 memory controllers. Each memory controller supports 2 DIMMs (3200 MHz DDR4), for the maximum memory bandwidth of 409.6 GB/s per socket.
The IOD can be configured for different NUMA node topologies. In case of the EPYC 7302 processor, it can be configured for 4, 2, and 1 NUMA nodes per socket as well as a single NUMA domain over the entire two sockets. These are denoted by NPS4, NPS2, NPS1, and NPS0, respectively. In addition, there is an option of exposing each L3 cache as a NUMA node, in which case a large memory node would have 16 NUMA nodes.
The current configuration for the large memory nodes is NPS1.
The usual Cori file systems are available, including
Each node also has
- A 3 TB local
This file system can be used for fast I/O with input and output files for your runs. As a proxy for checking I/O speed of the file system, we use IOR, and below are some MPI-IO rates in GB/sec, from 32-process runs on one node, with the transfer size and block size of 1 MB. Here SSF is for using a single shared file for the collective I/O, and FPP for using a separate file per process.
|Aggregate filesize||SSF read||SSF write||FPP read||FPP write|
Applications with different IO patterns or running on a shared node may see different results.
/tmp are not persistent - they are removed when your job finishes.
The large memory nodes are not connected to Cori’s Aries high speed network. Multi-node applications can use Open MPI to communicate over an InfiniBand network.
The user environment is similar to a Cori login node. However, the large memory nodes have AMD processors, unlike Cori, which has Intel processors. You will need to (re)compile your codes to run on the AMD hardware.
Access to the Large Memory Nodes¶
You can request access by filling out this form. Projects performing COVID-19 related research or with strong scientific use cases that have a need for the nodes’ architectural characteristics will be prioritized.
These nodes are accessed from a Cori login node via a Slurm batch job. If you are granted use of the large memory nodes, your account will be associated with the QOS' for use of large memory nodes. For details on this and how to submit jobs, please check the Slurm Access section.