Gitlab is a DevOps platform to allow software development teams to collaborate together by hosting code in source repository and automate build, integration and verification of code using Continuous Integration (CI)/Continuous Development (CD). The Gitlab Project is open source and actively maintained by Gitlab Inc.
NERSC provides user facing gitlab service available at https://software.nersc.gov/. You will be required to type your NERSC credentials in order to access service.
Running CI Pipelines at NERSC¶
The gitlab server provides shared runners in order to run CI jobs on NERSC resources. Currently we have the following runners
cori runner will use the system default slurm binaries
/usr/bin/sbatch to submit job to cluster whereas
cori-esslurm runner will submit job using the esslurm slurm binaries
/opt/esslurm/bin/ to submit job. The
cori-esslurm runner should be used when you need to submit job to Cori GPU cluster. Please refer to https://software.nersc.gov/ci-resources/corigpu on how to use
cori-esslurm to submit jobs to Cori GPU cluster.
There is no gitlab runner for Perlmutter at the moment, we plan on adding a new runner in the near future.
Currently we are unable to run CI jobs on Cori GPU via
SCHEDULER_PARAMETERS, we plan on having a fix in near future. Please see https://software.nersc.gov/ci-resources/corigpu project for status on Cori GPU pipeline.
We make use of Jacamar CI which is a Gitlab custom executor that allows one to run CI/CD jobs on HPC system. Jacamar provides integration with batch schedulers and downscoping of permission to ensure jobs are run via your user account. We recommend you review the ECP-CI documentation.
Please be careful of what you run in your CI job as they will be run via your user account. The Gitlab job will have access to all shared filesystem including $HOME directory that you typically have when accessing system. Any sensitive information should not be stored on NERSC system or displayed in Gitlab job. It is your responsiblity for proper use of NERSC system including Gitlab service. We are not responsible for any loss of data or issues with user environment as result of CI job.
Gitlab CI configuration is declared in a special file .gitlab-ci.yml that is typically available in the root of the project. Please review the reference guide for .gitlab-ci.yml. We encourage you review the Gitlab CI/CD documentation, please make sure you review the documentation for the appropriate version. You can see the gitlab version by navigating to https://software.nersc.gov/help.
Jacamar CI support scheduler integration with several batch executors including Slurm, LSF, and Cobalt. In Gitlab this is defined via
SCHEDULER_PARAMETERS variable which is used to request allocation on compute node. The variable can be defined in
.gitlab-ci.yml or as a project CI/CD variable.
You should check Slurm example jobs on how to submit job, it's important you define the slurm options correctly via
SCHEDULER_PARAMETERS otherwise your job will fail during slurm allocation. Here is a simple example on how one submits a job to Cori Haswell node. The tags keyword is used to select the gitlab runner to use in this case
tags: [cori] informs gitlab to send job to Cori system. The keyword
after_script are sections where you can run arbitrary shell commands. The stages keyword is used to define a list of stage name to group gitlab jobs; all jobs within a stage can execute in parallel. The stage keyword is used in context of a gitlab job, in this example the name of job is cori-haswell
You can find this example in https://software.nersc.gov/ci-resources/hello-environment.
Gitlab runner will be down when system is offline which may result in termination or failure of CI jobs
stages: - examine cori-haswell: stage: examine tags: [cori] variables: SCHEDULER_PARAMETERS: "-C haswell --qos=debug -N1 -t 00:05:00" script: - echo "Script" - bash ./environment.bash before_script: - echo "Before Script" - pwd - ls -la after_script: - echo "After Script" - whoami - hostname
Increase Job Timeout¶
By default, gitlab job will timeout after 60min and gitlab will terminate job and mark job as failure. You can increase the job timeout in project settings by navigating to
Settings > CI/CD > General Pipelines and set the Timeout value in minutes (
10m), hours (
10h) or days (
10d). The maximum timelimit is 30 days (
In order to use our gitlab server, you will need to create a Personal Access Token to perform any action since we have disabled SSH authentication when cloning repo. To create an access token navigate to https://software.nersc.gov/-/profile/personal_access_tokens and create a token name with appropriate scope. We recommend you enable scope
write_repository to read and write to repository, if you plan to use the gitlab API you may enable scope
api. Once you create a token, you will see a randomly generated token, please save this token, if you are using Mac you can use Keychain Access to store your password.
|Introduction to CI at NERSC||July 7th, 2021||Slides |
- CI Tutorial: https://software.nersc.gov/ci-resources