Skip to content

R

R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical tools, such as linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, graphics, and it is highly extensible.

R provides an Open Source route to express statistical methodologies, it is a GNU project with similarities to the S language and environment. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

R at NERSC

Type the following command to launch R:

$ module load R
$ R

To run R in an interactive allocation, allocate an interactive allocation and run R inside it.

$ salloc --qos=interactive -C knl --time=234
$ module load R
$ R

To run R through a batch job, make a script like the following and submit it.

#!/bin/bash
#SBATCH -C knl
#SBATCH --qos=regular

module load R
R CMD BATCH code.R

The content of code.R might look like.

j=1;
imagfilename = paste('myimag', j ,'.pdf',sep='');
pdf(file=imagfilename, width = 800, height =800)
x=1:10;
plot(x, main='R is fun')
dev.off()

Submitting your job script is just

sbatch myscript.sh

Creating R environments with Anaconda

We strongly encourage users to use Anaconda to create tailored R environments. This is typically the quickest way to install R packages, especially if those packages have additional dependencies on other libraries.

To get started started create a conda environment and add packages using conda, follow these steps.

$ module load R/3.6.1-conda
$ conda env remove -n myr
$ conda create -n myr -c r r-essentials
$ source activate myr

The name of your environment is up to you

Chose a name that reflects the purpose of this environment. You can also use this to create development envirornments and productions environments that may have different versions of packages.

Conda Environments can be used in Jupyter

If the r-irkernel module is installed in your environment, then your environment should show up in the list of available kernels in Jupyter if you create a kernel-spec file (https://jupyter.nersc.gov/).

$ source activate myr
$ conda install -c r r-irkernel
$ R
> IRkernel::installspec()
> exit

See the Conda documentation in the Python docs for more tips on using Conda environments.

How to Run R Code in Parallel

The following program illustrates how R can be used for 'coarse-grained parallelization', particularly useful when chunks of the computation are unrelated and do not need to communicate in any way. The example below uses the package parallel to create workers as lightweight processes via forking, and are very useful to optimize codes that use lapply, sapply, apply and related functions:

library("parallel")
f = function(x)
{
 sum = 0
 for (i in seq(1,x)) sum = sum + i
 return(sum)
}
n=1000
nCores <- detectCores()
result = mclapply(X=1:n, FUN = f, mc.cores=nCores)

References