PyTorch is a high-productivity Deep Learning framework based on dynamic computation graphs and automatic differentiation. It is designed to be as close to native Python as possible for maximum flexibility and expressivity.
Availability on Cori¶
PyTorch can be picked up from the Anaconda python installations (e.g. via
module load python) or from dedicated modules with distributed support (including MPI) enabled. You can see which versions are available with
module avail pytorch.
Current recommended version¶
The currently recommended version of PyTorch to use on Cori Haswell and KNL is the latest version,
v1.4.0, which can be loaded with
module load pytorch/v1.4.0
Want to integrate your own packages with PyTorch at NERSC? There are two suggested solutions:
- Install your packages on top of our PyTorch + Python installations - You can use the
$PYTHONUSERBASEenvironment variable (set automatically when you load one of our modules) and user installations with
pip install --user ...to install your own packages on top of our PyTorch installations.
- Install PyTorch into your custom conda environments - You can setup a conda environment as described in our Python documentation and install PyTorch into it. If you do not need distributed support, you can install PyTorch via conda or pip as described at https://pytorch.org/get-started/locally/. If you need distributed support, it can be a little trickier. We share our build scripts for PyTorch at https://github.com/sparticlesteve/nersc-pytorch-build. Please open a support ticket at http://help.nersc.gov/ for assistance.
PyTorch makes it fairly easy to get up and running with multi-node training via its included distributed package. Refer to the distributed tutorial for details: https://pytorch.org/tutorials/intermediate/dist_tuto.html
We're putting together a coherent set of example problems, datasets, models, and training code in this repository: https://github.com/NERSC/pytorch-examples
This repository can serve as a template for your research projects with a flexibly organized design for layout and code structure. The
template branch contains the core layout without all of the examples so you can build your code on top of that minimal, fully functional setup. The code provided should minimize your own boiler plate and let you get up and running in a distributed fashion on Cori as quickly and seamlessly as possible.
The examples include:
- A simple hello-world example
- HEP-CNN classifier
- ResNet50 CIFAR10 image classification
- HEP-GAN for generation of RPV SUSY images.
Note: Currently the most up-to-date examples of how to run distributed PyTorch on NERSC systems is actually in our benchmarking repository: https://github.com/sparticlesteve/pytorch-benchmarks. We are in the progress of updating the examples repository accordingly, but in the meantime you can refer to this repository to learn how to best set things up.