Nextflow¶
Nextflow is a data-centric workflow management tool which facilitates complex and reproducible scientific computational workloads.
A workflow pattern including the following characteristics might be successfully realized by using Nextflow at NERSC:
- It contains significant complexity in the form of different applications, data formats, repetition, conditional branching, and/or dependencies between all of these
- The maximum scaling you project will require is low
- Fast workflow turnaround is not a priority
- You already possess a system implemented with Nextflow
Strengths of Nextflow:¶
- Simple installation and basic execution
- No persistent database needed
- Powerful ability to express complex tasks and relationships between tasks
Disadvantages of Nextflow:¶
- Easy to unintentionally create a configuration which degrades scheduler performance for all NERSC users
- Weak integration with Slurm that is unable to adapt to NERSC scheduler policies. This can lead to long queue waiting times for tasks and excessive turnaround for workflows as a whole
- Poor ability to navigate NERSC policy constraints such as the limited maximum wall time available for resource requests
- Nextflow makes a lot of duplicate and intermediate file copies as it operates; processing large amounts of data can exhaust storage quotas
- Nextflow uses shared file systems to preserve workflow state; caching or slow synchronization of this information can be a source of undesired behavior
How to use Nextflow at NERSC¶
Start with the Nextflow documentation Quick Start. The system Java installation provided by default on Perlmutter is sufficient to run Nextflow.
This documentation only discusses specifics of running Nextflow at NERSC; for guidance expressing your workflows with Nextflow scripts, see the official Nextflow documentation.
For the most basic proof of concept demonstration, first choose a location on $SCRATCH
and install with the command curl -s https://get.nextflow.io | bash
To run the pre-installed demonstration use this command: ./nextflow run hello
.
Basic Example Output
elvis@login10:/nextflow> ./nextflow run hello
N E X T F L O W ~ version 24.10.2
Launching `https://github.com/nextflow-io/hello` [focused_leakey] DSL2 - revision: afff16a9b4 [master]
executor > local (4)
[34/6eaf2b] sayHello (2) [100%] 4 of 4 ✔
Hola world!
Hello world!
Bonjour world!
Ciao world!
elvis@login10:/nextflow>
We can directly run this same test by creating the following file and then using the command ./nextflow run local_hello.nf
This displays the same result as the first test (besides changes in unique run identifiers and time stamps).
local_hello.nf
#!/usr/bin/env nextflow
process sayHello {
input:
val x
output:
stdout
script:
"""
echo '$x world!'
"""
}
workflow {
Channel.of('Bonjour', 'Ciao', 'Hello', 'Hola') | sayHello | view
}
Warning
Nextflow uses hidden files and directories for things like batch submission scripts and log files. Keep this in mind when debugging a workflow.
Now that the basic operation is confirmed, we will describe how to run the Nextflow management and compute processes on more appropriate resources than a login node.
Wrapped Nextflow submission to Slurm¶
The easiest method is placing the ./nextflow ...
command into a batch script and submitting it to Slurm with sbatch
. The manager process will run on the allocated compute node, and all tasks are configured to use the local
executor; it's even possible to use srun
in your processes to run tasks which include MPI applications.
The major benefit of this method, besides simplicity, is only the initial submission waits in a Slurm queue; it is a good pattern for a workflow which includes a very large number of small tasks. It is the wrong approach if any individual tasks, or the sum total of all tasks, require longer than the single job wall time limit to complete.
One should not combine this method with the Nextflow 'Slurm' executor because the job running the Nextflow manager can end before requested tasks are finished waiting in a queue or executing.
There are two significant caveats to running the Nextflow workflow process directly inside a Slurm job allocation:
- The Nextflow working directory must be placed on the Perlmutter Lustre $SCRATCH filesystem. Nextflow uses a file locking feature not available on any of the other NERSC filesystems.
- The workflow cannot run longer than the maximum wall time available to a single job in the Slurm QOS being used. This can be partially mitigated by using multiple Slurm submissions in series and passing the
-resume
flag to Nextflow, but only progress for completely finished tasks will be preserved from one submission to the next.
Nextflow Submits Tasks as Slurm Jobs¶
Nextflow configuration can instruct the manager process to submit its tasks to Slurm instead of running them on the local host. If left with a default configuration, this manager process can generate a disruptive amount of communication requests to Slurm; the following configuration file entries reduce the frequency of those requests.
Place the following file in your Nextflow project working directory:
nextflow.config
process {
executor='slurm'
queueSize = 15
pollInterval = '5 min'
dumpInterval = '6 min'
queueStatInterval = '5 min'
exitReadTimeout = '13 min'
killBatchSize = 30
submitRateLimit = '20 min'
clusterOptions = '-q debug -t 00:30:00 -C cpu'
}
Inside the individual process definitions in your scripts, you will likely wish to override the clusterOptions
variable to request specific resources appropriate for that task. This can be done by adding something in the pattern of clusterOptions='-q regular -t 05:30:00 -C cpu'
to the top of your task process blocks.
The choice of machine used to run the Nextflow manager process is also important. A login node is fine for campaigns lasting a few days or shorter, but for longer workflows, NERSC recommends using the Perlmutter workflow QOS. The workflow QOS is intended for users to submit lightweight tasks which coordinate computational work. Users should be prepared to infrequently check on the Nextflow manager process and -resume
a long campaign Nextflow process that is found to have been inturrupted.