R¶

R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical tools, such as linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, graphics, and it is highly extensible.

R provides an Open Source route to express statistical methodologies, it is a GNU project with similarities to the S language and environment. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

R at NERSC¶

Type the following command to launch R:

$module load R$ R
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


To run R in an interactive allocation, allocate an interactive allocation and run R inside it.

salloc --qos=interactive -C knl --time=234
R


To run R through a batch job, make a script like the following and submit it.

#!/bin/bash
#SBATCH -C knl
#SBATCH --qos=regular

R CMD BATCH code.R


The content of code.R might look like.

j=1;
imagfilename = paste('myimag', j ,'.pdf',sep='');
pdf(file=imagfilename, width = 800, height =800)
x=1:10;
plot(x, main='R is fun')
dev.off()


Submitting your job script is just

sbatch myscript.sh


Creating R environments with Anaconda¶

We strongly encourage users to use Anaconda to create tailored R environments. This is typically the quickest way to install R packages, especially if those packages have additional dependencies on other libraries.

To get started started create a conda environment and add packages using conda, follow these steps.

module load R/3.6.1-conda
conda env remove -n myr
conda create -n myr -c r r-essentials
source activate myr


The name of your environment is up to you

Chose a name that reflects the purpose of this environment. You can also use this to create development environments and productions environments that may have different versions of packages.

Conda Environments can be used in Jupyter

If the r-irkernel module is installed in your environment, then your environment should show up in the list of available kernels in Jupyter if you create a kernel-spec file (https://jupyter.nersc.gov/).

$source activate myr$ conda install -c r r-irkernel
\$ R
> IRkernel::installspec()
> exit


See the Conda documentation in the Python docs for more tips on using Conda environments.

How to Run R Code in Parallel¶

The following program illustrates how R can be used for 'coarse-grained parallelization', particularly useful when chunks of the computation are unrelated and do not need to communicate in any way. The example below uses the package parallel to create workers as lightweight processes via forking, and are very useful to optimize codes that use lapply, sapply, apply and related functions:

library("parallel")
f = function(x)
{
sum = 0
for (i in seq(1,x)) sum = sum + i
return(sum)
}
n=1000
nCores <- detectCores()
result = mclapply(X=1:n, FUN = f, mc.cores=nCores)