Skip to content


Build WRF

Required modules

The majority of the WRF model code is written in Fortran, but some part and ancillary programs are written in C (WRF UG ch.2). For most cases, we run WRF using either shared-memory parallelism using the OpenMP application programming interface,
distributed memory message passing (MPI) parallelism across nodes, or both of them as hybrid. Therefore, we need to use Fortran and C compilers that supports OpenMP along with the MPI library. For compiling such a complex program, NERSC provides compiler wrappers that combine compilers and various libraries (including MPI) necessary to run shared- and distributed-memory program on the NERSC systems.

In addition, WRF requires the netCDF library for input and output. WRF can use the parallel netcdf library to read/write netcdf files through multiple MPI tasks simultaneously, taking advantage of the Lustre file system of the Perlmutter scratch space. WRF can also use the file compression functionality from the netCDF4.0 or later version, which depends on the HDF5 library . See Balle & Johnsen (2016) for WRF I/O options and their performance.


NetCDF4 (and underlying HDF5) library provides parallel read/write functionality, which is currently available as one of I/O options in WRF (README.netcdf4par). However, experiments by a WRF-SIG member found that the netcdf4 parallel I/O is significantly slower than the I/O using the parallel netcdf library.


Another useful knowledge about netCDF library is the limitation on the size of the variable in a file, which depends on the netCDF data format (CDF1 = Classic ->2GB, CDF2=64-bit offset ->4GB, netCDF4 and CDF5 -> unlimited). See the table "Large File Support" in the NetCDF Users Guide . The current WRF code supports serial I/O of CDF1, CDF2, and netCDF4. The WRF's interface to the parallel netcdf library supports CDF1 and CDF2.


If a user runs a high-resolution, large-domain simulation with the number of columns greater than ~1500 x ~1500, a 3D variable will be larger than 4GB and it is necessary to modify WRF's I/O source code to use the CDF5 format.

Our experience shows that using the netCDF and parallel netcdf libraries provide flexible (we can specify either serial netcdf or parallel netcdf in the run-time WRF namelist) and much faster (parallel netCDF I/O is 10--20 times faster than serial netCDF I/O) I/O on the scratch system. Therefore, we recommend to build WRF with the netCDF (cray-netcdf module) and parallel netCDF (cray-parallel-netcdf module) libraries. This I/O choice is activated by setting a few environmental variables when compiling WRF after loading the netcdf and parallel netcdf libraries. With these two modules, we set the following environmental variables when compiling WRF

module load cray-hdf5   #the netcdf library depends on hdf5
module load cray-netcdf
module load cray-parallel-netcdf

export NETCDF_classic=1               #use classic (CDF1) as default
#use 64-bit offset format (CDF2) of netcdf files
#netcdf4 compression (serial) with the hdf5 module can be very slow

and then specify the CDF2 format for high-resolution simulations at run-time by setting the namelist variables io_form_history, io_form_restart, etc., to be 11 for parallel netcdf I/O and 2 for standard serial netcdf.


[A discussion in the WRF user forum] ( suggests that not only wrfinput but also wrfbdy data can be read through the parallel netcdf library if WPS is compiled appropriately with the parallel netcdf.

Example module loading script

set -e
#command example:
#./ pm  #"pm" for Perlmutter

machname=$1 #first input argument, use "pm" for Perlmutter

scname=$BASH_SOURCE  #name of this script

echo "loading modules using ${scname}"
echo "target system: ${machname} "

#users may want to unload unnecessary/conflicting modules loaded in .bash_profile. 
#e.g., hugepages. But keep other modules loaded automatically by the system. 
#each user has to edit here:
# module unload craype-hugepages64M

#general modules
if [ "$machname" = "pm" ]; then  #Perlmutter
    module load cpu  
    module load PrgEnv-gnu 
    module load gcc 
    module load cray-mpich
    module load craype
    #module load craype-hugepages64M #hugepages may not be needed for Perlmutter
    #also encountered run-time error with hugepages module after the 2022-12 maintenance
    echo "the machname argument is not a valid system name"
    exit 11

#module for WRF file I/O
#order of loading matters!
module load cray-hdf5  #required to load netcdf library
module load cray-netcdf 
module load cray-parallel-netcdf

module list

Build WRF on Perlmutter

WRF's build process starts with running the "configure" csh script that comes with the WRF source code package. This script automatically checks the computing platform and asks for a user input about the parallel job configuration.

On Perlmutter, we have tested the default gnu environment. Tested inputs to the "configure" csh script are gnu (dm+sm) and basic nesting:

Please select from among the following Linux x86_64 options:
32. (serial)  33. (smpar)  34. (dmpar)  35. (dm+sm)   GNU (gfortran/gcc)
Enter selection [1-75] :
Compile for nesting? (0=no nesting, 1=basic,...) [default 0] :

For real cases (not idealized cases like the 2d squall line), we recommend the option 35 (dm+sm) based on our experience of 4 threads per MPI rank (dm+sm) performing better than the pure MPI (dm) using the same number of nodes. We will update the WRF performance evaluation & scaling on Perlmutter in late 2023.

After running the configure program, we run the "compile" csh script in the top directory of the WRF source code. In the example bash script below, we do this by setting the following

doclean_all=false #true if compiled different configure options




The compile script does several checks and invokes the make command, among other things.

Example WRF build script for Perlmutter

#!/bin/bash -l
set -e
set -o pipefail 

imach="pm"  #target system name. "pm" for Perlmutter.

#change the following boolean variables to run/skip certain compiling steps
doclean_all=true #true if compiled different configure options


runconf=true    #run WRF's configure script; should do this first before compiling

docompile=false  #run WRF's compile script; should do this after configure

debug=false  #true to compile WRF with debug flag (no optimizations, -g flag for debugger, etc.)

# WRF directories
#WRF-SIG project directory as example; accessible only by WRF-SIG members

export WRF_DIR=${wrfroot}/${mversion}/WRF

#Modules --------------------------------------------------------------------
modversion="2022-12"  #use year of major update that module (default) are introduced (INC0182147)
source ${loading_script} ${imach}

#set environmental variables used by WRF build system, borrowing environmental variables 
#set by modules

export NETCDF_classic=1               #use classic (CDF1) as default
export WRFIO_NCD_LARGE_FILE_SUPPORT=1 #use 64-bit offset format (CDF2) of netcdf files
export USE_NETCDF4_FEATURES=0         #do not use netcdf4 compression (serial), need hdf5 module
#configure says WRF won't use netcdf4 compression, but I still see a flag

export HDF5=$HDF5_DIR
export HDF5_LIB="$HDF5_DIR/lib"
export HDF5_BIN="$HDF5_DIR/bin"


#need to append "/gnu/9.1"

export LD_LIBRARY_PATH="/usr/lib64":${LD_LIBRARY_PATH}
#export PATH=${NETCDF_BIN}:${PATH}

#other special flags to test
export PNETCDF_QUILT="0"  #not stable and best configuration not figured out on Perlmutter

#check environment variables
echo "PATH: "$PATH

echo "NETCDF is $NETCDF"

echo "HDF5 is $HDF5"
echo "HDF5_LIB is $HDF5_LIB"


##capture starting time for log file name
idate=$(date "+%Y-%m-%d-%H_%M")
##run make in the top directory

if [ "$doclean_all" = true ]; then
    ./clean -a
    #"The './clean –a' command is required if you have edited the configure.wrf 
    #or any of the Registry files.", but this deletes configure.wrf....


if [ "$doclean" = true ]; then

#echo "running configure"
if [ "$runconf" = true ]; then

    if [ "$debug" = true ]; then
        echo "configure debug mode"
        ./configure -d

   ##configure options selected are:
   # 32. (serial)  33. (smpar)  34. (dmpar)  35. (dm+sm)   GNU (gfortran/gcc)
   # choose 35 for real (not idealized) cases


    #the sed commands below will edit the following in configure.wrf
    #from the original
    #SFC             =       gfortran
    #SCC             =       gcc
    #CCOMP           =       gcc
    #DM_FC           =       mpif90
    #DM_CC           =       mpicc

    #to the following (FC and CC with MPI)
    #SFC             =       gfortran
    #SCC             =       gcc
    #CCOMP           =       cc
    #DM_FC           =       ftn
    #DM_CC           =       cc

    if [ -f "$configfile" ]; then
        echo "editing configure.wrf"
        #need to remove -cc=$(SCC) in DM_CC
        sed -i 's/-cc=\$(SCC)/ /' ${configfile}
        sed -i 's/mpif90/ftn/' ${configfile}
        sed -i 's/mpicc/cc/' ${configfile}

        #also user can remove the flag -DWRF_USE_CLM from ARCH_LOCAL if not planning to 
        #use the CLM4 land model to speed up compilation
        #sed -i 's/-DWRF_USE_CLM/ /' ${configfile} 



if [ "$docompile" = true ]; then
    export J="-j 4"  #build in parallel
    echo "J = $J"

    echo  "compile log file is ${bldlog}"

    #run the compile script 
    ./compile em_real &> ${bldlog}
    #./compile em_les &> ${bldlog}  #idealized LES case used for one of benchmark tests

    set +e 
    #grep command exits the script in case of nomatch after the 2022-12 maintenance
    grep "Problems building executables" compile_em_real_${idate}_${imach}.log
    set -e  

    if [ $RESULT -eq 0 ]; then
        echo "compile failed, check ${bldlog}"      
        echo "compile success"
        #sometimes renaming executable with descriptive information is useful
        #cp $WRF_DIR/main/ideal.exe $WRF_DIR/main/ideal_${idate}_${imach}.exe
        #cp $WRF_DIR/main/real.exe $WRF_DIR/main/real_${idate}_${imach}.exe
        #cp $WRF_DIR/main/wrf.exe $WRF_DIR/main/wrf_${idate}_${imach}.exe
        #cp $WRF_DIR/main/ndown.exe $WRF_DIR/main/ndown_${idate}_${imach}.exe


As seen in the example script, the compiler names need to be edited in the configure.wrf file. Specifically, we need to change the compiler names for MPI applications to compiler wrappers.

  • change "mpif90" to "ftn" for the DM_FC flag
  • change "mpicc" to "cc" for the DM_FC flag
  • keep SFC and SCC to be the base compiler (gfortran and gcc)

These edits allow us to compile WRF on the login node.


Experience of WRF-SIG members and NERSC Best Practice recommends the following:

  1. Use the scratch space for model execution and set an appropriate file strip on the execution directory.

  2. Use the parallel netcdf library I/O Time spent on history and restart file writing is reduced by at least 30%, often to be 1/10 of a serial netCDF I/O (need to set an appropriate stripe size for the output directory).

  3. Use four OpenMP threads instead of use all the available physical cores for MPI tasks. On Perlmutter for the COUNS 2.5km benchmark case, an 8-node job with 4 OpenMP threads and 256 MPI ranks performs as almost equally fast as a 16-nodes job with 2048 MPI ranks without OpenMP threads. The latter (16 nodes) is twice more expensive than the former (8 nodes); note the charge to project allocation depends on the number of nodes used and the wall-clock hours, among others Compute Usage Charging.

Example WRF sbatch script for Perlmutter

#SBATCH -q debug
#SBATCH -t 00:30:00
#SBATCH -J test
#SBATCH -A <account>   #user needs to change this
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=<email address>  #and this
#SBATCH -C cpu
#SBATCH --tasks-per-node=64

#debug queue on perlmutter limited to max of 4 nodes
icase="test" #name of simulation, used as the name of the execution directory
ntile=4  #number of OpenMP threads per MPI task
#need to set the "numtiles" variable in the wrf namelist (namelist.input) to be the same 

#example using the WRFSIG project CFS directories; 
#files only accesible by the WRFSIG members

rundir="/pscratch/sd/e/elvis/simulation/WRF/${icase}" #user needs to change this

#Modules --------------------------------------------------------------------
modversion="2022-12"  #use year of major update that module (default) are introduced (INC0182147)
source ${loading_script} ${imach}

#OpenMP settings:
export OMP_NUM_THREADS=$ntile
export OMP_PLACES=threads
export OMP_PROC_BIND=spread
#export OMP_AFFINITY_FORMAT="host=%H, pid=%P, thread_num=%n, thread affinity=%A"

#MPI state and statistics
#export MPICH_MEMORY_REPORT=3 #can lead to a run-time error after the 2022-12 maintenance

cd $rundir

mkdir -p logs

#capture starting time
tstart=$(date "+%s")

#run simulation
srun -n 64 -c 4 --cpu_bind=cores ${bindir}/${binname}

#capture error code

#capture ending time
tend=$(date "+%s")


#rename and save the process 0 out and err files
cp rsl.error.0000 rsl.error_0_$SLURM_JOB_ID
cp rsl.out.0000 rsl.out_0_$SLURM_JOB_ID

To set appropriate options for the srun command (by considering process and thread affinity ), users are encouraged to use the jobscript generator available in the MyNERSC website.


Balle, T., & Johnsen, P. (2016). Improving I / O Performance of the Weather Research and Forecast ( WRF ) Model. Cray User Group.