Skip to content

Fireworks

FireWorks is a free, open-source code for defining, managing, and executing complex scientific workflows.

Strengths of FireWorks

  • Well-suited to NERSC-- we have many FireWorks users
  • Can handle tasks that contain MPI (example below)
  • Extremely flexible and able to handle complex job structures
  • Friendly dashboard to display workflow status
  • Can run tasks on CPU or GPU (GPU example below)

Disadvantages of FireWorks

  • Requires a mongoDB database which must be set up by NERSC staff
  • FireWorks requires some time to learn

Overview of FireWorks

FireWorks uses a centralized server model where the server manages the workflows and workers run the jobs. It can handle complex job dependencies, restart jobs that have died, and track the workflow status on a friendly dashboard. It can be used via bash scripting or via a Python API. We will cover using FireWorks via bash shell here. You can find more information in the official FireWorks documentation.

overview_of_fireworks_dashboard

To use FireWorks you must first request a mongoDB database. Please use the database creation form to request your mongoDB database.

If you need more help with FireWorks itself please open a separate ticket with NERSC consulting at help.nersc.gov.

Terminology

FireWorks uses a number of terms to describe the different parts of the workflow manager:

  • LaunchPad/FireServer: MongoDB that controls the workflow. It contains all the tasks to be run, and whether they have run successfully etc.
  • FireTask: computing tasks to be performed.
  • FireWork: list of FireTasks.
  • Rocket: fetches a FireWork from the LaunchPad and runs it. Could be run on a separate machine (FireWorker) or through a batch system (in this case, SLURM).

Creating your Fireworks Environment

We recommend that anyone wanting to use FireWorks at NERSC install it in a conda environment.

Create a new conda environment for FireWorks
module load python
conda create -n fireworks python=3.9 -y
conda activate fireworks
mamba install -c conda-forge fireworks pytest

For more information about using conda environments at NERSC, check out this page.

Setting up your LaunchPad for both Cori and Perlmutter

Setting up the Launchpad can be done interactively using the command lpad init, similar to the example shown below. You will need to specify the name of the database host and your own username and password - note that you need to have admin access to the database, so make sure you use the administrators username/password with which you were provided.

On a login node, navigate to the directory where you will issue your fireworks commands. In this example, we are working from $HOME/fw_test. Make sure you have initialized your FireWorks environment, for example via conda activate fireworks.

How to configure your LaunchPad

You can follow the steps in our example below. Note you'll want to use the address to your mongoDB database, your database name, your administrator database name, and the password you were provided.

lpad init

Please supply the following configuration values
(press Enter if you want to accept the defaults)

Enter host parameter. (default: localhost). Example: 'localhost' or 'mongodb+srv://CLUSTERNAME.mongodb.net': mongodb07.nersc.gov
Enter port parameter. (default: 27017). : 
Enter name parameter. (default: fireworks). Database under which to store the fireworks collections: my_db
Enter username parameter. (default: None). Username for MongoDB authentication: my_db_admin
Enter password parameter. (default: None). Password for MongoDB authentication: my_password
Enter ssl_ca_file parameter. (default: None). Path to any client certificate to be used for Mongodb connection: 
Enter authsource parameter. (default: None). Database used for authentication, if not connection db. e.g., for MongoDB Atlas this is sometimes 'admin'.: 

Configuration written to my_launchpad.yaml!

This information will go into a file named "my_launchpad.yaml". This file should be located in the directory where you are issuing FireWorks commands. Note that since our $HOME filesystem is shared, this same launchpad file will be used on both Cori and Perlmutter.

Basic example

Below is an example of how to use FireWorks at NERSC. This is based heavily on the Fireworks tutorial which you will find at the Materials Project website.

spec:
  _tasks:
  - _fw_name: ScriptTask
    script: echo "howdy, your job launched successfully!" >> howdy.txt

Here all we do is write a file called howdy.txt, but in your real workload you can substitute in the name of your script.

Singleshot example

Now let's add our fw_test.yaml to the launchpad and run it:

lpad add fw_test.yaml
rlaunch singleshot

The output should look like:

(fireworks) elvis@cori10:~/fw_test> rlaunch singleshot
2021-02-24 19:57:04,135 INFO Hostname/IP lookup (this will take a few seconds)
2021-02-24 19:57:04,136 INFO Launching Rocket
2021-02-24 19:57:04,223 INFO RUNNING fw_id: 1 in directory: /global/u1/s/elvis/fw_test
2021-02-24 19:57:04,232 INFO Task started: ScriptTask.
2021-02-24 19:57:04,240 INFO Task completed: ScriptTask 
2021-02-24 19:57:04,257 INFO Rocket finished
Rapidfire example

One task isn't very interesting. Let's try again with two tasks:

lpad add fw_test.yaml
lpad add fw_test.yaml

Note: adding the FireTask again will result in a second, identical job. Now let's run in rapidfire mode instead of singleshot mode:

rlaunch rapidfire

In this mode, FireWorks will launch tasks until it runs out. The output data (output files and job status) is given in launcher_* directories.

The output should look like:

(fireworks) elvis@cori07:~> rlaunch rapidfire
2021-02-22 13:18:26,274 INFO Hostname/IP lookup (this will take a few seconds)
2021-02-22 13:18:26,351 INFO Created new dir /global/u1/s/elvis/launcher_2021-02-22-21-18-26-349075
2021-02-22 13:18:26,351 INFO Launching Rocket
2021-02-22 13:18:26,370 INFO RUNNING fw_id: 2 in directory: /global/u1/s/elvis/launcher_2021-02-22-21-18-26-349075
2021-02-22 13:18:26,378 INFO Task started: ScriptTask.
2021-02-22 13:18:26,385 INFO Task completed: ScriptTask 
2021-02-22 13:18:26,402 INFO Rocket finished
2021-02-22 13:18:26,406 INFO Created new dir /global/u1/s/elvis/launcher_2021-02-22-21-18-26-406464
2021-02-22 13:18:26,406 INFO Launching Rocket
2021-02-22 13:18:26,423 INFO RUNNING fw_id: 3 in directory: /global/u1/s/elvis/launcher_2021-02-22-21-18-26-406464
2021-02-22 13:18:26,428 INFO Task started: ScriptTask.
2021-02-22 13:18:26,436 INFO Task completed: ScriptTask 
2021-02-22 13:18:26,453 INFO Rocket finished

Submitting FireWorks jobs via SLURM

You are probably most interested in how to use FireWorks to launch jobs at NERSC via SLURM. We will use an example to demonstrate how this works. You can read more about how to launch FireWorks jobs in a queue in the official FireWorks documentation.

You will need the following:

  1. A queue-adapter for SLURM, my_qadapter.yaml
  2. The script you want to run, wrapped in fw_test.yaml

You will also need to be in the directory where you have configured your LaunchPad (i.e. did lpad init). You can always move your my_launchpad.yaml file or re-configure your launchpad.

Here is an example of the SLURM queue adapter. You can copy this and save it as my_qadapter.yaml.

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -l my_launchpad.yaml rapidfire
ntasks: 1
cpus_per_task: 1
ntasks_per_node: 1
walltime: '00:02:00'
queue: debug
constraint: haswell
account: null
job_name: null
logdir: null
pre_rocket: null
post_rocket: null

You can read more about FireWorks queue adapters at the Materials Project queue adapter page.

Launching a FireWorks job via SLURM
lpad reset
lpad add fw_test.yaml
qlaunch singleshot

The result should look something like:

(fireworks) elvis@cori10:~/fw_test> qlaunch singleshot
2021-02-22 21:16:43,317 INFO moving to launch_dir /global/u1/s/elvis/fw_test
2021-02-22 21:16:43,318 INFO submitting queue script
2021-02-22 21:16:45,847 INFO Job submission was successful and job_id is 39771362

This means your my_fwork.yaml was submitted successfully to SLURM.

To submit more tasks with a 3 second pause between them, you can do the following:

lpad reset
lpad add fw_test.yaml
lpad add fw_test.yaml
lpad add fw_test.yaml
qlaunch rapidfire -m 3

To submit tasks in "infinite mode" which maintains 2 jobs in the queue until the LaunchPad is empty:

lpad reset
lpad add fw_test.yaml
lpad add fw_test.yaml
lpad add fw_test.yaml
lpad add fw_test.yaml
qlaunch rapidfire -m 2 --nlaunches infinite

You can find more information about launching FireWorks jobs via a queue at the Materials Project queue page.

For information about how to check the status of your FireWorks tasks or workflows, please see the Materials Project query tutorial page.

Running an MPI task with Fireworks

Many workflow engines cannot support running MPI-based tasks, but FireWorks can. We'll demonstrate with an example.

Create a FireWorks conda environment for MPI tasks
module load python
conda create -n fireworks-mpi --clone lazy-mpi4py
conda activate fireworks-mpi
mamba install fireworks pytest -c conda-forge

Create a new FireWork fw_test_mpi.yaml to run our MPI test workload:

spec:
  _tasks:
  - _fw_name: ScriptTask
    script: srun python -m mpi4py.bench helloworld

You'll need to update your my_qadapter.yaml to allocate several MPI tasks.

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -l my_launchpad.yaml rapidfire
constraint: haswell
ntasks: 10
account: <your account>
walltime: '00:02:00'
queue: regular
job_name: null
logdir: null
pre_rocket: null
post_rocket: null
Run our MPI test program in singleshot mode
lpad reset
lpad add fw_test_mpi.yaml
qlaunch singleshot

The submission should look like:

(fireworks-pm) elvis@perlmutter:login34:~/fw_test> qlaunch singleshot
2022-04-04 12:59:47,611 INFO moving to launch_dir /global/u1/s/elvis/fw_test
2022-04-04 12:59:47,633 INFO submitting queue script
2022-04-04 12:59:47,679 INFO Job submission was successful and job_id is 1704603

Here is some example output:

2022-04-04 15:58:21,603 INFO Hostname/IP lookup (this will take a few seconds)
2022-04-04 15:58:22,364 INFO Created new dir /global/u1/s/elvis/fw_test/launcher_2022-04-04-22-58-22-362433
2022-04-04 15:58:22,365 INFO Launching Rocket
2022-04-04 15:58:22,385 INFO RUNNING fw_id: 1 in directory: /global/u1/s/elvis/fw_test/launcher_2022-04-04-22-58-22-362433
2022-04-04 15:58:22,405 INFO Task started: ScriptTask.
Hello, World! I am process 0 of 10 on nid00932.
Hello, World! I am process 1 of 10 on nid00932.
Hello, World! I am process 2 of 10 on nid00932.
Hello, World! I am process 3 of 10 on nid00932.
Hello, World! I am process 4 of 10 on nid00932.
Hello, World! I am process 5 of 10 on nid00932.
Hello, World! I am process 6 of 10 on nid00932.
Hello, World! I am process 7 of 10 on nid00932.
Hello, World! I am process 8 of 10 on nid00932.
Hello, World! I am process 9 of 10 on nid00932.
2022-04-04 15:58:23,717 INFO Task completed: ScriptTask
2022-04-04 15:58:23,739 INFO Rocket finished

Running a FireWorks GPU job on Perlmutter

Create a FireWorks conda environment for GPU tasks
module load python
conda create -n fireworks-pm python=3.9 -y
conda activate fireworks-pm
mamba install fireworks cupy pytest -c conda-forge
GPU CuPy program we'll run cupy_eigh.py
import numpy as np
import cupy as cp

def cupy_eigh(input_data,precision):

    x = cp.asarray(input_data, dtype=precision)
    w,v = cp.linalg.eigh(x)

    #move back to cpu
    w_cpu = cp.asnumpy(w)

    return w_cpu

asize = 1000
rng = np.random.default_rng(seed=42)
randarray = rng.random((asize, asize))

input_data = randarray
precision = 'float64'

w_cpu = cupy_eigh(input_data, precision)

print(w_cpu)

Create a new FireWork fw_test_gpu.yaml to run our GPU test workload:

spec:
  _tasks:
  - _fw_name: ScriptTask
    script: srun python $HOME/fw_test/cupy_eigh.py

You'll need to update your my_qadapter.yaml to allocate gpu resources.

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -l my_launchpad.yaml rapidfire
constraint: gpu
ntasks: 1
cpus_per_task: 128
gpus_per_task: 1
account: <your account>
walltime: '00:02:00'
queue: gpu_regular
job_name: null
logdir: null
pre_rocket: null
post_rocket: null
Run our GPU test program in singleshot mode
lpad reset
lpad add fw_test_gpu.yaml
qlaunch singleshot

The submission should look like:

(fireworks-pm) elvis@perlmutter:login34:~/fw_test> qlaunch singleshot
2022-04-04 12:59:47,611 INFO moving to launch_dir /global/u1/s/elvis/fw_test
2022-04-04 12:59:47,633 INFO submitting queue script
2022-04-04 12:59:47,679 INFO Job submission was successful and job_id is 1704603

Here is some example (abbreviated) output:

2022-04-04 14:36:37,966 INFO Hostname/IP lookup (this will take a few seconds)
2022-04-04 14:36:38,276 INFO Created new dir /global/u1/s/elvis/fw_test/launcher_2022-04-04-21-36-38-275352
2022-04-04 14:36:38,276 INFO Launching Rocket
2022-04-04 14:36:38,292 INFO RUNNING fw_id: 2 in directory: /global/u1/s/elvis/fw_test/launcher_2022-04-04-21-36-38-275352
2022-04-04 14:36:38,481 INFO Task started: ScriptTask.
[-1.81188101e+01 -1.79220171e+01 -1.77867352e+01 -1.77242020e+01
 -1.76233284e+01 -1.75455378e+01 -1.73279111e+01 -1.72173740e+01
 -1.71490059e+01 -1.71065062e+01 -1.69899352e+01 -1.69071616e+01
 -1.68864381e+01 -1.68246481e+01 -1.66270042e+01 -1.65552821e+01
 -1.65027105e+01 -1.64155099e+01 -1.63703294e+01 -1.62853948e+01
 -1.62594976e+01 -1.61890458e+01 -1.61514735e+01 -1.60706853e+01
 -1.60518401e+01 -1.60134389e+01 -1.59594303e+01 -1.59196455e+01
 -1.58262790e+01 -1.57379236e+01 -1.56650916e+01 -1.56105525e+01
...
2022-04-04 14:38:12,489 INFO Task completed: ScriptTask
2022-04-04 14:38:12,505 INFO Rocket finished

Display the FireWorks dashboard

The FireWorks dashboard can be viewed on your laptop/desktop if you connect to Cori with ssh port forwarding. For example, connecting using an sshproxy key:

ssh -L 5000:localhost:5000 cori.nersc.gov

Once you're on Cori, navigate to the directory where you have initialized your launchpad and start up the FireWorks webgui:

module load python
conda activate fireworks
cd $HOME/fw_test
lpad webgui

This will open a summary-- you can hit q to exit. Your console should display

(fireworks) elvis@cori06:~/fw_test> lpad webgui
 * Serving Flask app "fireworks.flask_site.app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Leave this window running. You can CTRL+C to kill your FireWorks dashboard when you're done.

Since you've forwarded port 5000 from Cori to port 5000 on your local machine, you can open a browser and navigate to

http://127.0.0.1:5000/

You should be able to see and interact with the FireWorks dashboard tracking your jobs on Cori. You'll have to periodically refresh the browser page to get updated job status. Note that if you'd like to track jobs on Perlmutter, you'll need to ssh directly to Perlmutter instead.

screenshot_of_fireworks_dashboard

If you have any questions or problems using FireWorks at NERSC, please contact us at help.nersc.gov.