Skip to content

Queues and Charges

This page details the QOS and queue usage policies. Examples for each type of Cori job are available.


When a job runs on a NERSC supercomputer, charges accrue against one of the user's projects. The unit of accounting for these charges is the "Node Hour", based on the performance of the nodes on Perlmutter. The total number of charged hours for a job is a function of:

  • the number of nodes and the walltime used by the job,
  • the QOS of the job, and
  • the "charge factor" for the system upon which the job was run.

Job charging policies, including the intended use of each queue, are outlined in more detail under "Policies". This page summarizes the limits and charges applicable to each queue.

Selecting a Queue

Jobs are submitted to different queues depending on the queue constraints and the user's desired outcomes. Each queue corresponds to a "Quality of Service" (QOS): Each queue has a different service level in terms of priority, run and submit limits, walltime limits, node-count limits, and cost. At NERSC, the terms "queue" and "QOS" are often used interchangeably.

Most jobs are submitted to the "regular" queue, but a user with a particularly urgent scientific emergency may decide to submit to the premium queue for faster turnaround. Another user who does not need the results of this run for many weeks may elect to use the low queue to cut down on costs. And a user who needs fast turnaround while they are using the large telescope could prearrange with NERSC to use the realtime queue for these runs. The user with the scientific emergency will incur a higher-than-regular charge to use the premium queue, while a user who can be flexible about their required runtime is rewarded with a substantial discount.

Assigning Charges

Users who are members of more than one project can select which one should be charged for their jobs by default. In Iris, under the "Compute" tab in the user view, select the project you wish to make default.

To charge to a non-default project, use the -A projectname flag in Slurm, either in the Slurm directives preamble of your script, e.g.,

#SBATCH -A myproject

or on the command line when you submit your job, e.g., sbatch -A myproject ./myscript.sl.

Warning

For users who are members of multiple NERSC projects, charges are made to the default project, as set in Iris, unless the #SBATCH --account=<NERSC project> flag has been set.

Calculating Charges

The cost of a job is computed in the following manner: $$ \text{walltime in hours} \times \text{number of nodes} \times \text{QOS factor} \times \text{charge factor} $$.

Example

The charge for a job that runs for 40 minutes on 3 Haswell nodes in the premium QOS (QOS factor of 2) would be calculated $$ \frac{40\ \text{mins}}{60\ \text{min/hr}} \times 3\ \text{nodes} \times 2 \times 0.35\ \text{charged-hours/node-hour} = \frac{2}{3} \times 3 \times 2 \times 0.35 = 1.4\ \text{charged hours}.$$

Example

A job which ran for 35 minutes on 3 KNL nodes on Cori with the regular QOS would be charged: $$ \frac{35}{60}\ \text{hours} \times 3\ \text{nodes} \times 0.2 = 0.35\ \text{charged hours} $$

Note

Jobs in the "shared" QOS are only charged for the fraction of the node used.

Example

A job which ran for 12 hours on 4 physical cores (each core has 2 hyperthreads) on Cori Haswell with the shared QOS would be charged: $$ 12\ \text{hours} \times (2 \times 4\ \text{cores}/64) \times 0.35 = 0.3\ \text{charged hours} $$

Note

Jobs are charged according to the resources they made unavailable for other jobs, i.e., the number of nodes reserved (regardless of use) and the actual walltime used (regardless of the specified limit).

Charge Factors

Charge factors for Allocation Year 2022 are being renormalized around the performance of Perlmutter CPU and GPU nodes.

Architecture Charge Factor Conversion: AY21 to AY22
Cori Haswell 0.35 Multiply by 0.0025 or divide by 400
Cori KNL 0.2 Multiply by 0.0025 or divide by 400
Cori Large Memory Nodes (cmem, bigmem) 0.35 Multiply by 0.0025 or divide by 400
Perlmutter CPU 1 1 N/A
Perlmutter GPU 1 1 N/A

Note

Perlmutter GPU is allocated separately from the rest of the resources.

QOS Cost Factor: Charge Multipliers and Discounts

The QOS cost factor is a function of which queue a job runs in. If a job must be urgently run, then a user might submit it to the premium queue, and incur a 2x charge factor. Jobs in the flex queue, on the other hand, receive a substantial discount in exchange for flexibility about walltime.

QOS QOS Factor Conditions
regular 1 (standard charge factor)
flex 0.25 uses Cori KNL nodes
flex 0.5 uses Cori Haswell nodes
premium 2 less than 20% of allocation has been used in premium queue
premium 4 more than 20% of allocation has been used in premium queue

QOS Limits and Charges

Perlmutter GPU

QOS Max nodes Max time (hrs) Submit limit Run limit Priority QOS Factor Charge per Node-Hour
regular - 12 5000 - medium 1 1
interactive 4 4 5000 2 high 1 1
jupyter 4 6 1 1 high 1 1
debug 8 0.5 5 2 medium 1 1
preempt 128 24 (preemptible after two hours) 5000 - medium 0.25 0.25
overrun - 12 5000 - very low 0 0
realtime custom custom custom custom very high 1 1
  • Nodes allocated by a "regular" QOS job are exclusively used by the job.

  • Jobs in the preemptible queue can be preempted after two hours. Jobs can be automatically requeued after preemption using the --requeue sbatch flag. See the Preemptible Jobs section for details.

  • NERSC's JupyterHub uses the "jupyter" QOS to start Jupyter notebook servers on compute nodes. Other uses of the QOS are currently not authorized, and the QOS is monitored for unauthorized use.

  • Jobs may run on the "standard" Perlmutter GPU nodes or on the subset of GPU nodes which have double the GPU-attached memory. To specifically request these higher-bandwidth memory nodes, use -C gpu&hbm80g in your job script instead of -C gpu. Jobs with this constraint must use 256 or fewer nodes.

Perlmutter CPU

QOS Max nodes Max time (hrs) Submit limit Run limit Priority QOS Factor Charge per Node-Hour
regular - 12 5000 - medium 1 1
interactive 4 4 5000 2 high 1 1
jupyter 4 6 1 1 high 1 1
debug 8 0.5 5 2 medium 1 1
shared3 0.5 12 5000 - medium 1 13
preempt 128 24 (preemptible after two hours) 5000 - medium 0.5 0.5
overrun - 12 5000 - very low 0 0
realtime custom custom custom custom very high 1 1
  • Nodes allocated by a "regular" QOS job are exclusively used by the job.

  • Even though there is no node limit for the regular queue, not all of the projected 3072 nodes are available today. Please check the state of the nodes in the regular queue with sinfo -s -p regular_milan_ss11

  • Jobs in the preemptible queue can be preempted after two hours. Jobs can be automatically requeued after preemption using the --requeue sbatch flag. See the Preemptible Jobs section for details.

  • NERSC's JupyterHub uses the "jupyter" QOS to start Jupyter notebook servers on compute nodes. Other uses of the QOS are currently not authorized, and the QOS is monitored for unauthorized use.

Perlmutter Login

QOS Max nodes Max time (hrs) Submit limit Run limit Priority QOS Factor Charge per Node-Hour
xfer 1 (login) 48 100 15 low - 0
cron 1/128 (login) 24 - - low - 0
workflow 0.25 (login) 2160 - - low - 0

Cori Haswell

QOS Max nodes Max time (hrs) Submit limit Run limit Priority QOS Factor Charge per Node-Hour
regular 512/19322 48 5000 - 4 1 0.35
shared3 0.5 48 10000 - 4 1 0.353
interactive 644 4 2 2 - 1 0.35
debug 64 0.5 5 2 3 1 0.35
premium 1772 48 5 - 2 2 -> 45 0.705
flex 64 48 5000 - 6 0.5 0.175
overrun 1772 4 5000 - 5 0 0
xfer 1 (login) 48 100 15 - - 0
bigmem 1 (login) 72 100 1 - 1 0.35
realtime custom custom custom custom 1 custom custom
compile 1 (login) 24 5000 2 - - 0

Cori KNL

QOS Max nodes Max time (hrs) Submit limit Run limit Priority QOS Factor Charge per Node-Hour
regular Full System6 48 5000 - 4 1 0.2
interactive 644 4 2 2 - 1 0.2
debug 512 0.5 5 2 3 1 0.2
premium Full System6 48 5 - 2 2 -> 45 0.45
low Full System6 48 5000 - 5 0.5 0.1
flex 256 48 5000 - 6 0.25 0.05
overrun Full System6 4 5000 - 7 0 0

JGI Accounts

There are 192 Haswell nodes reserved for the "genepool" and "genepool_shared" QOSes combined. Jobs run with the "genepool" QOS uses these nodes exclusively. Jobs run with the "genepool_shared" QOS can share nodes.

QOS Max nodes Max time (hrs) Submit limit Run limit Priority
genepool 16 72 500 - 3
genepool_shared 0.5 72 500 - 3

Discounts

  • Big job discount The "regular" QOS charges on Cori KNL are discounted by 50% if a job uses 1024 or more nodes. This discount is available only in the regular QOS for Cori KNL.

    System Architecture Big Job Discount Conditions
    Cori KNL 0.5 Job using 1024 or more nodes in regular queue

In addition several QOS's offer reduced charging rates:

  • The "low" QOS (available on Cori KNL only) is charged 50% as compared to the "regular" QOS, but no extra large job discount applies.

  • The "flex" QOS is charged 50% as compared to the "regular" QOS on Haswell and 25% as compared to the "regular" qos on KNL.

  • The "overrun" QOS is free of charge and is only available to projects that are out of allocation time. Please refer to the overrun section for more details.


  1. Charging began for Perlmutter on October 28, 2022. 

  2. Batch jobs submitted to the Haswell partition requesting more than 512 nodes must go through a compute reservation

  3. Shared jobs are only charged for the fraction of the node resources used. 

  4. Batch job submission is not enabled and the 64-node limit applies per project not per user. 

  5. The charge factor for "premium" QOS will be doubled once a project has spent more than 20 percent of its allocation in "premium". 

  6. At any time a subset of KNL nodes will be unavailable due to being down or being reserved for other queues. You can check for the maximum currently-requestable number of nodes with sinfo -p regularx_knl -o "%D %T", the total number of allocated and idle nodes is the practical limit you can request.