Skip to content

R

R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical tools, such as linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, graphics, and it is highly extensible.

R provides an Open Source route to express statistical methodologies, it is a GNU project with similarities to the S language and environment. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

R at NERSC

Type the following command to launch R:

$ module load R
$ R
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

To run R in an interactive allocation, allocate an interactive allocation and run R inside it.

salloc --qos=interactive -C knl --time=234
module load R
R

To run R through a batch job, make a script like the following and submit it.

#!/bin/bash
#SBATCH -C knl
#SBATCH --qos=regular

module load R
R CMD BATCH code.R

The content of code.R might look like.

j=1;
imagfilename = paste('myimag', j ,'.pdf',sep='');
pdf(file=imagfilename, width = 800, height =800)
x=1:10;
plot(x, main='R is fun')
dev.off()

Submitting your job script is just

sbatch myscript.sh

Creating R environments with Anaconda

We strongly encourage users to use Anaconda to create tailored R environments. This is typically the quickest way to install R packages, especially if those packages have additional dependencies on other libraries.

To get started started create a conda environment and add packages using conda, follow these steps.

module load R/3.6.1-conda
conda env remove -n myr
conda create -n myr -c r r-essentials
source activate myr

The name of your environment is up to you

Chose a name that reflects the purpose of this environment. You can also use this to create development environments and productions environments that may have different versions of packages.

Conda Environments can be used in Jupyter

If the r-irkernel module is installed in your environment, then your environment should show up in the list of available kernels in Jupyter if you create a kernel-spec file (https://jupyter.nersc.gov/).

You may provide a unique name for your kernel when you install the kernel spec, otherwise your local R kernel will supersede the system default.

$ source activate myr
$ conda install -c r r-irkernel
$ R
> IRkernel::installspec(name='myr', displayname='R 3.6 (myr)')
[InstallKernelSpec] Installed kernelspec myr in ~/.local/share/jupyter/kernels/myr
> quit()

See the Conda documentation in the Python docs for more tips on using Conda environments.

How to Run R Code in Parallel

The following program illustrates how R can be used for 'coarse-grained parallelization', particularly useful when chunks of the computation are unrelated and do not need to communicate in any way. The example below uses the package parallel to create workers as lightweight processes via forking, and are very useful to optimize codes that use lapply, sapply, apply and related functions:

library("parallel")
f = function(x)
{
 sum = 0
 for (i in seq(1,x)) sum = sum + i
 return(sum)
}
n=1000
nCores <- detectCores()
result = mclapply(X=1:n, FUN = f, mc.cores=nCores)

References