Gitlab¶
Gitlab is a DevOps platform to allow software development teams to collaborate together by hosting code in source repository and automate build, integration and verification of code using Continuous Integration (CI)/Continuous Development (CD). The Gitlab Project is open source and actively maintained by Gitlab Inc.
Access¶
NERSC provides user facing gitlab service available at https://software.nersc.gov/. You will be required to type your NERSC credentials in order to access service.
Running CI Pipelines at NERSC¶
The gitlab server provides shared runners in order to run CI jobs on NERSC resources. Currently we have the following runners
Runner Name | System | Access |
---|---|---|
cori | Cori | All Users |
cori-esslurm | Cori | All Users |
The cori
runner will use the system default slurm binaries /usr/bin/sbatch
to submit job to cluster whereas cori-esslurm
runner will submit job using the esslurm slurm binaries /opt/esslurm/bin/
to submit job. The cori-esslurm
runner should be used when you need to submit job to Cori GPU cluster. Please refer to https://software.nersc.gov/ci-resources/corigpu on how to use cori-esslurm
to submit jobs to Cori GPU cluster.
Note
There is no gitlab runner for Perlmutter at the moment, we plan on adding a new runner in the near future.
Note
Currently we are unable to run CI jobs on Cori GPU via SCHEDULER_PARAMETERS
, we plan on having a fix in near future. Please see https://software.nersc.gov/ci-resources/corigpu project for status on Cori GPU pipeline.
We make use of Jacamar CI which is a Gitlab custom executor that allows one to run CI/CD jobs on HPC system. Jacamar provides integration with batch schedulers and downscoping of permission to ensure jobs are run via your user account. We recommend you review the ECP-CI documentation.
Warning
Please be careful of what you run in your CI job as they will be run via your user account. The Gitlab job will have access to all shared filesystem including $HOME directory that you typically have when accessing system. Any sensitive information should not be stored on NERSC system or displayed in Gitlab job. It is your responsiblity for proper use of NERSC system including Gitlab service. We are not responsible for any loss of data or issues with user environment as result of CI job.
Gitlab CI configuration is declared in a special file .gitlab-ci.yml that is typically available in the root of the project. Please review the reference guide for .gitlab-ci.yml. We encourage you review the Gitlab CI/CD documentation, please make sure you review the documentation for the appropriate version. You can see the gitlab version by navigating to https://software.nersc.gov/help.
Scheduler Integration¶
Jacamar CI support scheduler integration with several batch executors including Slurm, LSF, and Cobalt. In Gitlab this is defined via SCHEDULER_PARAMETERS
variable which is used to request allocation on compute node. The variable can be defined in .gitlab-ci.yml
or as a project CI/CD variable.
You should check Slurm example jobs on how to submit job, it's important you define the slurm options correctly via SCHEDULER_PARAMETERS
otherwise your job will fail during slurm allocation. Here is a simple example on how one submits a job to Cori Haswell node. The tags keyword is used to select the gitlab runner to use in this case tags: [cori]
informs gitlab to send job to Cori system. The keyword script
, before_script
and after_script
are sections where you can run arbitrary shell commands. The stages keyword is used to define a list of stage name to group gitlab jobs; all jobs within a stage can execute in parallel. The stage keyword is used in context of a gitlab job, in this example the name of job is cori-haswell
You can find this example in https://software.nersc.gov/ci-resources/hello-environment.
Note
Gitlab runner will be down when system is offline which may result in termination or failure of CI jobs
stages:
- examine
cori-haswell:
stage: examine
tags: [cori]
variables:
SCHEDULER_PARAMETERS: "-C haswell --qos=debug -N1 -t 00:05:00"
script:
- echo "Script"
- bash ./environment.bash
before_script:
- echo "Before Script"
- pwd
- ls -la
after_script:
- echo "After Script"
- whoami
- hostname
Increase Job Timeout¶
By default, gitlab job will timeout after 60min and gitlab will terminate job and mark job as failure. You can increase the job timeout in project settings by navigating to Settings > CI/CD > General Pipelines
and set the Timeout value in minutes (10m
), hours (10h
) or days (10d
). The maximum timelimit is 30 days (30d
).
For more details see https://docs.gitlab.com/ee/ci/pipelines/settings.html#set-a-limit-for-how-long-jobs-can-run
Access Token¶
In order to use our gitlab server, you will need to create a Personal Access Token to perform any action since we have disabled SSH authentication when cloning repo. To create an access token navigate to https://software.nersc.gov/-/profile/personal_access_tokens and create a token name with appropriate scope. We recommend you enable scope read_repository
and write_repository
to read and write to repository, if you plan to use the gitlab API you may enable scope read_api
, read_user
and api
. Once you create a token, you will see a randomly generated token, please save this token, if you are using Mac you can use Keychain Access to store your password.
Resources¶
Title | Date | Links |
---|---|---|
Introduction to CI at NERSC | July 7th, 2021 | Slides Video |
- CI Tutorial: https://software.nersc.gov/ci-resources