Workflow Mangement Tools¶
Supporting data-centric science involves the movement of data, multi-stage processing, and visualization at scales where manual control becomes prohibitive and automation is needed. Workflow technologies can improve the productivity and efficiency of data-centric science by orchestrating and automating these steps.
A NERSC working group review and refresh of this content is currently in progress; an initial update is expected by April 2020. In the meantime we request the following of users considering workflow management solutions:
- Before you begin developing a codebase which requires a particular workflow manager, please contact NERSC consultants to confirm it can be effectively used at NERSC. Some tools have infrastructure needs or operate in a manner which is fundamentaly incompatible with NERSC systems and we'd like to protect users from wasting effort if we can.
- Please do not write your own workflow manager. More than 200 such solutions already exist and almost certainly one of them can be found which will fit your needs and our infrastructure.
- Please don't do this!!! Issuing many
For i=1=10,000 srun -n 1 a.out
sruns in a short period of time really stresses our SLURM scheduler. It will ruin not only your own job performance but also the performance for all other NERSC users, too. If this is what you need for your application, please consider a workflow tool. This is what they were designed to do!
GNU Parallel is a shell tool for executing commands in parallel and in sequence on a single node. Parallel is a very usable and effective tool for running High Throughput Computing workloads without data dependencies at NERSC. Following simple Slurm command patterns allows parallel to scale up to running tasks in job allocations with multiple nodes.
TaskFarmer is a utility developed at NERSC to distribute single-node tasks across a set of compute nodes - these can be single- or multi-core tasks. TaskFarmer tracks which tasks have completed successfully, and allows straightforward re-submission of failed or un-run jobs from a checkpoint file.
The Swift scripting language provides a simple, compact way to write parallel scripts that run many copies of ordinary programs concurrently in various workflow patterns, reducing the need for complex parallel programming or arcane scripting. Swift is very general, and is in use in domains ranging from earth systems to bioinformatics to molecular modeling.
FireWorks is a free, open-source code for defining, managing, and executing scientific workflows. It can be used to automate calculations over arbitrary computing resources, including those that have a queueing system. Some features that distinguish FireWorks are dynamic workflows, failure-detection routines, and built-in tools and execution modes for running high-throughput computations at large computing centers. It uses a centralized server model, where the server manages the workflows and workers run the jobs.
Other Workflow Tools¶
If you find that these tools don't meet your needs, you can check out some of the other workflow tools we currently support.