Queues and Charges¶
This page details the QOS and queue usage policies. Examples for each type of Cori job are available.
When a job runs on a NERSC supercomputer, charges accrue against one of the user's projects. The unit of accounting for these charges is the "Node Hour", based on the performance of the nodes on Perlmutter. The total number of charged hours for a job is a function of:
- the number of nodes and the walltime used by the job,
- the QOS of the job, and
- the "charge factor" for the system upon which the job was run.
Job charging policies, including the intended use of each queue, are outlined in more detail under "Policies". This page summarizes the limits and charges applicable to each queue.
Selecting a Queue¶
Jobs are submitted to different queues depending on the queue constraints and the user's desired outcomes. Each queue corresponds to a "Quality of Service" (QOS): Each queue has a different service level in terms of priority, run and submit limits, walltime limits, node-count limits, and cost. At NERSC, the terms "queue" and "QOS" are often used interchangeably.
Most jobs are submitted to the "regular" queue, but a user with a particularly urgent scientific emergency may decide to submit to the premium queue for faster turnaround. Another user who does not need the results of this run for many weeks may elect to use the low queue to cut down on costs. And a user who needs fast turnaround while they are using the large telescope could prearrange with NERSC to use the realtime queue for these runs. The user with the scientific emergency will incur a higher-than-regular charge to use the premium queue, while a user who can be flexible about their required runtime is rewarded with a substantial discount.
Assigning Charges¶
Users who are members of more than one project can select which one should be charged for their jobs by default. In Iris, under the "Compute" tab in the user view, select the project you wish to make default.
To charge to a non-default project, use the -A projectname
flag in Slurm, either in the Slurm directives preamble of your script, e.g.,
#SBATCH -A myproject
or on the command line when you submit your job, e.g., sbatch -A myproject ./myscript.sl
.
Warning
For users who are members of multiple NERSC projects, charges are made to the default project, as set in Iris, unless the #SBATCH --account=<NERSC project>
flag has been set.
Calculating Charges¶
The cost of a job is computed in the following manner: $$ \text{walltime in hours} \times \text{number of nodes} \times \text{QOS factor} \times \text{charge factor} $$.
Example
The charge for a job that runs for 40 minutes on 3 Haswell nodes in the premium QOS (QOS factor of 2) would be calculated $$ \frac{40\ \text{mins}}{60\ \text{min/hr}} \times 3\ \text{nodes} \times 2 \times 0.35\ \text{charged-hours/node-hour} = \frac{2}{3} \times 3 \times 2 \times 0.35 = 1.4\ \text{charged hours}.$$
Example
A job which ran for 35 minutes on 3 KNL nodes on Cori with the regular QOS would be charged: $$ \frac{35}{60}\ \text{hours} \times 3\ \text{nodes} \times 0.2 = 0.35\ \text{charged hours} $$
Note
Jobs in the "shared" QOS are only charged for the fraction of the node used.
Example
A job which ran for 12 hours on 4 physical cores (each core has 2 hyperthreads) on Cori Haswell with the shared QOS would be charged: $$ 12\ \text{hours} \times (2 \times 4\ \text{cores}/64) \times 0.35 = 0.3\ \text{charged hours} $$
Note
Jobs are charged according to the resources they made unavailable for other jobs, i.e., the number of nodes reserved (regardless of use) and the actual walltime used (regardless of the specified limit).
Charge Factors¶
Charge factors for Allocation Year 2022 are being renormalized around the performance of Perlmutter CPU and GPU nodes.
Architecture | Charge Factor | Conversion: AY21 to AY22 |
---|---|---|
Cori Haswell | 0.35 | Multiply by 0.0025 or divide by 400 |
Cori KNL | 0.2 | Multiply by 0.0025 or divide by 400 |
Cori Large Memory Nodes (cmem , bigmem ) | 0.35 | Multiply by 0.0025 or divide by 400 |
Perlmutter CPU 1 | 1 | N/A |
Perlmutter GPU 1 | 1 | N/A |
Note
Perlmutter GPU is allocated separately from the rest of the resources.
QOS Cost Factor: Charge Multipliers and Discounts¶
The QOS cost factor is a function of which queue a job runs in. If a job must be urgently run, then a user might submit it to the premium queue, and incur a 2x charge factor. Jobs in the flex queue, on the other hand, receive a substantial discount in exchange for flexibility about walltime.
QOS | QOS Factor | Conditions |
---|---|---|
regular | 1 | (standard charge factor) |
flex | 0.25 | uses Cori KNL nodes |
flex | 0.5 | uses Cori Haswell nodes |
premium | 2 | less than 20% of allocation has been used in premium queue |
premium | 4 | more than 20% of allocation has been used in premium queue |
QOS Limits and Charges¶
Perlmutter GPU¶
QOS | Max nodes | Max time (hrs) | Submit limit | Run limit | Priority | QOS Factor | Charge per Node-Hour |
---|---|---|---|---|---|---|---|
regular | - | 12 | 5000 | - | medium | 1 | 1 |
interactive | 4 | 4 | 5000 | 2 | high | 1 | 1 |
jupyter | 4 | 6 | 1 | 1 | high | 1 | 1 |
debug | 8 | 0.5 | 5 | 2 | medium | 1 | 1 |
preempt | 128 | 24 (preemptible after two hours) | 5000 | - | medium | 0.25 | 0.25 |
overrun | - | 12 | 5000 | - | very low | 0 | 0 |
realtime | custom | custom | custom | custom | very high | 1 | 1 |
-
Nodes allocated by a "regular" QOS job are exclusively used by the job.
-
Jobs in the preemptible queue can be preempted after two hours. Jobs can be automatically requeued after preemption using the
--requeue
sbatch flag. See the Preemptible Jobs section for details. -
NERSC's JupyterHub uses the "jupyter" QOS to start Jupyter notebook servers on compute nodes. Other uses of the QOS are currently not authorized, and the QOS is monitored for unauthorized use.
-
Jobs may run on the "standard" Perlmutter GPU nodes or on the subset of GPU nodes which have double the GPU-attached memory. To specifically request these higher-bandwidth memory nodes, use
-C gpu&hbm80g
in your job script instead of-C gpu
. Jobs with this constraint must use 256 or fewer nodes.
Perlmutter CPU¶
QOS | Max nodes | Max time (hrs) | Submit limit | Run limit | Priority | QOS Factor | Charge per Node-Hour |
---|---|---|---|---|---|---|---|
regular | - | 12 | 5000 | - | medium | 1 | 1 |
interactive | 4 | 4 | 5000 | 2 | high | 1 | 1 |
jupyter | 4 | 6 | 1 | 1 | high | 1 | 1 |
debug | 8 | 0.5 | 5 | 2 | medium | 1 | 1 |
shared3 | 0.5 | 12 | 5000 | - | medium | 1 | 13 |
preempt | 128 | 24 (preemptible after two hours) | 5000 | - | medium | 0.5 | 0.5 |
overrun | - | 12 | 5000 | - | very low | 0 | 0 |
realtime | custom | custom | custom | custom | very high | 1 | 1 |
-
Nodes allocated by a "regular" QOS job are exclusively used by the job.
-
Even though there is no node limit for the regular queue, not all of the projected 3072 nodes are available today. Please check the state of the nodes in the regular queue with
sinfo -s -p regular_milan_ss11
-
Jobs in the preemptible queue can be preempted after two hours. Jobs can be automatically requeued after preemption using the
--requeue
sbatch flag. See the Preemptible Jobs section for details. -
NERSC's JupyterHub uses the "jupyter" QOS to start Jupyter notebook servers on compute nodes. Other uses of the QOS are currently not authorized, and the QOS is monitored for unauthorized use.
Perlmutter Login¶
QOS | Max nodes | Max time (hrs) | Submit limit | Run limit | Priority | QOS Factor | Charge per Node-Hour |
---|---|---|---|---|---|---|---|
xfer | 1 (login) | 48 | 100 | 15 | low | - | 0 |
cron | 1/128 (login) | 24 | - | - | low | - | 0 |
workflow | 0.25 (login) | 2160 | - | - | low | - | 0 |
Cori Haswell¶
QOS | Max nodes | Max time (hrs) | Submit limit | Run limit | Priority | QOS Factor | Charge per Node-Hour |
---|---|---|---|---|---|---|---|
regular | 512/19322 | 48 | 5000 | - | 4 | 1 | 0.35 |
shared3 | 0.5 | 48 | 10000 | - | 4 | 1 | 0.353 |
interactive | 644 | 4 | 2 | 2 | - | 1 | 0.35 |
debug | 64 | 0.5 | 5 | 2 | 3 | 1 | 0.35 |
premium | 1772 | 48 | 5 | - | 2 | 2 -> 45 | 0.705 |
flex | 64 | 48 | 5000 | - | 6 | 0.5 | 0.175 |
overrun | 1772 | 4 | 5000 | - | 5 | 0 | 0 |
xfer | 1 (login) | 48 | 100 | 15 | - | - | 0 |
bigmem | 1 (login) | 72 | 100 | 1 | - | 1 | 0.35 |
realtime | custom | custom | custom | custom | 1 | custom | custom |
compile | 1 (login) | 24 | 5000 | 2 | - | - | 0 |
Cori KNL¶
QOS | Max nodes | Max time (hrs) | Submit limit | Run limit | Priority | QOS Factor | Charge per Node-Hour |
---|---|---|---|---|---|---|---|
regular | Full System6 | 48 | 5000 | - | 4 | 1 | 0.2 |
interactive | 644 | 4 | 2 | 2 | - | 1 | 0.2 |
debug | 512 | 0.5 | 5 | 2 | 3 | 1 | 0.2 |
premium | Full System6 | 48 | 5 | - | 2 | 2 -> 45 | 0.45 |
low | Full System6 | 48 | 5000 | - | 5 | 0.5 | 0.1 |
flex | 256 | 48 | 5000 | - | 6 | 0.25 | 0.05 |
overrun | Full System6 | 4 | 5000 | - | 7 | 0 | 0 |
JGI Accounts¶
There are 192 Haswell nodes reserved for the "genepool" and "genepool_shared" QOSes combined. Jobs run with the "genepool" QOS uses these nodes exclusively. Jobs run with the "genepool_shared" QOS can share nodes.
QOS | Max nodes | Max time (hrs) | Submit limit | Run limit | Priority |
---|---|---|---|---|---|
genepool | 16 | 72 | 500 | - | 3 |
genepool_shared | 0.5 | 72 | 500 | - | 3 |
Discounts¶
-
Big job discount The "regular" QOS charges on Cori KNL are discounted by 50% if a job uses 1024 or more nodes. This discount is available only in the regular QOS for Cori KNL.
System Architecture Big Job Discount Conditions Cori KNL 0.5 Job using 1024 or more nodes in regular queue
In addition several QOS's offer reduced charging rates:
-
The "low" QOS (available on Cori KNL only) is charged 50% as compared to the "regular" QOS, but no extra large job discount applies.
-
The "flex" QOS is charged 50% as compared to the "regular" QOS on Haswell and 25% as compared to the "regular" qos on KNL.
-
The "overrun" QOS is free of charge and is only available to projects that are out of allocation time. Please refer to the overrun section for more details.
-
Batch jobs submitted to the Haswell partition requesting more than 512 nodes must go through a compute reservation. ↩
-
Batch job submission is not enabled and the 64-node limit applies per project not per user. ↩↩
-
At any time a subset of KNL nodes will be unavailable due to being down or being reserved for other queues. You can check for the maximum currently-requestable number of nodes with
sinfo -p regularx_knl -o "%D %T"
, the total number ofallocated
andidle
nodes is the practical limit you can request. ↩↩↩↩