Cori KNL Processor Modes¶
The Xeon-Phi "Knights-Landing" 7250 processors in Cori have 68 CPU cores where are organized into 34 "tiles" (each tile comprising two CPU cores and a shared 1MB L2 cache) which are placed in a 2D mesh, connected via an on-chip interconnect as shown in the following figure:
As shown in the figure, the KNL processor has 6 DDR channels, with controllers to the right and left of the mesh 8 MCDRAM channels, with controllers spread across 4 "corners" of the mesh.
NUMA on KNL¶
A KNL processor maintains cache coherency with a set of tag directories distributed across the tiles such that any memory address corresponds to the tag directory cache on a particular tile. KNL supports several modes of memory access organization, which are well-described in this article.
The Cori KNL nodes are in "quadrant" mode, in which the chip is divided into four quadrants, and the tag directories in a quadrant map to memory accessed via a memory controller in that quadrant. In quadrant mode, the whole chip is presented as a single NUMA domain. The diagram below illustrates how a cache miss on one tile is resolved in quadrant mode.
MCDRAM Memory Options on KNL¶
There is no shared L3 cache on the KNL processor. However, the 16 GB of MCDRAM (spread over 8 channels) can be configured either as a direct-mapped cache or as addressable memory. On Cori KNL nodes, the MCDRAM is configured as a direct-mapped cache.
In this configuration recently accessed data is automatically cached in MCDRAM, similarly to an L3 cache on a Xeon processor. However, there are somenotable differences:
- The cache (16GB) is significantly larger than a typical L3 cache on a Xeon processor (usually in the tens of MB).
- The cache is direct-mapped. Meaning it is non-associative - each cache-line worth of data in DRAM has one location it can be cached in MCDRAM. This can lead to possible conflicts for apps with greater than 16GB working sets.
- Data is not prefetched into the MCDRAM cache