Reservations¶
Users can request a scheduled reservation of machine resources if their jobs have special needs that cannot be accommodated through the regular batch system. A reservation brings some portion of the machine to a specific user or project for an agreed upon duration. Typically this is used for interactive debugging at scale or real time processing linked to some experiment or event.
Note
Reservations are not intended to be used to guarantee fast throughput for production runs.
Charging¶
For normal batch jobs, charging against a project's allocation is done on a per-job basis. For scheduled reservations, however, the entire time during which resources were reserved (and therefore unavailable to other users) is charged regardless of the number of nodes actively used or time spent running jobs. If a reservation is terminated early, the project's allocation is charged for only the duration that the reservation was active, e.g., for a two-node, five-hour reservation that was canceled at the end of the third hour, the project would be charged 6 node-hours (2 nodes times 3 hours).
Requesting a reservation¶
To reserve compute nodes, a request must be sent in with at least 1 week notice. Please ask for the least amount of resources you need and as far in advance as possible so as to minimize impact on other users. It is also recommended that reservations be scheduled to start during NERSC business hours to ensure availability of staff in case any issues arise.
Cancellations¶
Cancellation of a reservation must be done with a minimum of 4 days notice. If you do not receive a confirmation that your cancellation was received and it is less than 4 days until your start time you must contact NERSC operations via 1-800-666-3772 (or 1-510-486-8600) menu option 1 to confirm.
Viewing reservations¶
To view all reservations run scontrol show reservations
or scontrol show res
. (Actually any substring of the word reservations
starting with res
will work.) The output consists of one entry per reservation name (the unique identifier of the reservation, which is used with option --reservation
to access the reservation via sbatch
or salloc
). Key reservation fields such as StartTime
EndTime
, Duration
, Nodes
,Users
, and Accounts
can provide you with an understanding of the reservation and its constraints.
$ scontrol show reservations
ReservationName=debug StartTime=2023-01-15T09:34:30 EndTime=2024-01-15T09:34:30 Duration=365-00:00:00
Nodes=nid[001080-001081,003220-003221,003764-003765,004196-004199,005284-005287,005368-005371,006184-006187,006296,006520-006523,006792-006795,006852-006855,006920-006923,006996-006999] NodeCnt=43 CoreCnt=5120 Features=(null) PartitionName=(null) Flags=MAINT,OVERLAP,IGNORE_JOBS,SPEC_NODES
TRES=cpu=10240
Users=root Groups=(null) Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
ReservationName=intro_cpu StartTime=2023-06-08T14:00:00 EndTime=2023-06-08T15:30:00 Duration=01:30:00
Nodes=nid[004174-004185,004187-004195,004200-004213] NodeCnt=35 CoreCnt=4480 Features=(null) PartitionName=regular_milan_ss11 Flags=
TRES=cpu=8960
Users=(null) Groups=(null) Accounts=ntrain3 Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
ReservationName=intro_gpu StartTime=2023-06-08T14:00:00 EndTime=2023-06-08T15:30:00 Duration=01:30:00
Nodes=nid[001004,001008-001009,001012-001013,001016-001017,001020-001021,001028] NodeCnt=10 CoreCnt=640 Features=(null) PartitionName=gpu_ss11 Flags=
TRES=cpu=1280
Users=(null) Groups=(null) Accounts=ntrain3_g Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
In order to use a reservation, the system administrators will grant access to individual users or by accounts (project). To filter by reservation name you can do one of the following
scontrol show reservations=<ReservationName>
scontrol show reservations <ReservationName>
scontrol show res=<ReservationName>
scontrol show res <ReservationName>
Shown below is a summary for reservation intro_cpu
$ scontrol show res=intro_cpu
ReservationName=intro_cpu StartTime=2023-06-08T14:00:00 EndTime=2023-06-08T15:30:00 Duration=01:30:00
Nodes=nid[004174-004185,004187-004195,004200-004213] NodeCnt=35 CoreCnt=4480 Features=(null) PartitionName=regular_milan_ss11 Flags=
TRES=cpu=8960
Users=(null) Groups=(null) Accounts=ntrain3 Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
The intro_cpu
reservation is accessible to all users belonging to the ntrain3
account because Accounts=ntrain3
is set even though no users are defined Users=(null)
. The Users
option can be set to limit the user accounts that can gain access to the reservation, i.e., a reservation with Users=elvis,jimi
could be used only by users elvis
and jimi
. Note that STATE will show as ACTIVE
(instead of INACTIVE
as above) during the reservation window.
If you notice an error in your reservation, please reach out to us in order to make a correction. Take note of all fields listed in the reservation to ensure we have allocated the resources that match your request.
Using a reservation¶
Once your reservation request is approved and a reservation is placed on the system, to run jobs in the reservation, you can use the --reservation
option on the command line:
nersc$ sbatch --reservation=<reservation_name>
nersc$ salloc --reservation=<reservation_name>
or add #SBATCH --reservation=<reservation_name>
to your job script.
Note
It is possible to submit jobs to a reservation once it is created - jobs will start immediately when the reservation is available.
An existing job that is queued but has not yet started can be updated to run in a reservation using the command scontrol update jobid=<jobid> reservationname=<reservation_name>
. For example, to update job 12345678
to run in the reservation called cool_res
, the command is scontrol update jobid=12345678 reservationname=cool_res
.
To remove job 12345678
from requesting to run in the reservation, simply repeat the update command with no argument to reservationname
: scontrol update jobid=12345678 reservationname=
. (This can be especially useful in the case where there are leftover jobs after the reservation ends.)
Job scripts for reservations¶
Job scripts for reservations are essentially identical to batch scripts submitted to any existing QOS, with two notable differences:
- The
--reservation=<reservation_name>
option must be included, either within the script or on the command line as described above. - The walltime limit (
--time
or-t
) is unconstrained. For example, it can exceed the maximum walltime in theregular
QOS. It can also exceed the duration of the reservation (but the job will be killed when the reservation ends).
The --constraint
(or -C
) option is still required. The --qos
(or -q
) option is essentially ignored, except that --qos=shared
will enable the same behavior as the shared QOS (permitting separate jobs to run on a fraction of the same node at the same time). For best results, include the --account
(-A
) option.
Ending a reservation¶
All running jobs under a reservation will be terminated when the reservation ends. There are two ways to end a reservation earlier than scheduled:
- When requesting the reservation, you can ask us to activate a setting that will terminate the reservation 15 minutes after all jobs in the reservation queue have completed. Please note if you choose this option and your reservation is inadvertantly terminated, we will not be able to schedule you a new reservation until the next business day.
- If your reservation does not have the above setting and you complete all planned computations before the reservation ends, please call NERSC operations at 1-800-666-3772 (or 1-510-486-8600) menu option 1 to cancel the reservation.