How to use Python in Shifter¶
- want better performance at large scales (10+ nodes) by improving library load times?
- want a more portable way to manage your Python stack?
- want an environment that is easy to use on a login node, compute node, or as a Jupyter kernel?
- want much more control over your software stack, for stability or legacy software reasons?
- feel tired of conda environments that make it hard to stay under your filesystem quota?
If any of these apply to you, you may find Shifter a good solution for using Python at NERSC.
We performed a small benchmarking study to compare Python performance on
/global/common/software, and Shifter. We summarize the results here:
|Number of nodes|| || ||Shifter|
This benchmark example supports our recommendation that users consider using Shifter at jobsizes larger than 10 nodes. At large scale (100+ nodes), we strongly urge users to use Shifter. If Shifter is not an option, we suggest that users consider
At NERSC, our current container solution is Shifter. Below, we provide several example Python Dockerfiles intended to help get you started using Python in Shifter. You should be able to copy and use all of these Dockerfiles to build images on your own system. You will also find an mpi4py example on our main Shifter page.
Example Python Dockerfiles¶
Basic Python Dockerfile example¶
First we'll demonstrate a basic container with Python. We'll make it easy by starting from an image where Python 3 is already installed. Note that we are using the
latest tag, so if you require a different version, you will need to adjust this tag. We'll install
scipy using pip. If your Python setup is relatively simple, you may find that pip will meet your package installation requirements within an image. If your setup is more complex or if you rely on packages that are only distributed via conda, you'll want to skip ahead to our next example.
FROM docker.io/library/python:latest WORKDIR /opt RUN \ pip3 install \ --no-cache-dir \ numpy \ scipy
Conda environment Dockerfile example¶
For those of you who are used to conda environments, there are a few key concepts that you will find different in containers. First, you won't want to build and activate a separate custom environment. Instead, you'll just want to install the packages you need into the base environment and then make this environment available by adding it to your
PATH. We suggest that each image be used for a single Python environment. (If you find yourself needing multiple conda environments in the same image, most likely you'll want multiple images.) To save space, you'll likely want to start with miniconda. In this example, we'll start from an image in which miniconda has already been installed. As in our previous example, we'll install
FROM docker.io/continuumio/miniconda3:latest ENV PATH=/opt/conda/bin:$PATH RUN /opt/conda/bin/conda install numpy scipy
Python GPU Dockerfile example¶
If you plan to use Python on GPUs, you will likely find it easiest to start with an NVIDIA-provided image that includes CUDA and related libraries. This example demonstrates how to build an image to use Dask. In our example, we
FROM on top of an NVIDIA CUDA base image. Note that in addition to
base, NVIDIA also offers runtime and devel flavors of images.
In this example, we use mamba to speed up the package installation process. You can also see that we attempt to shrink our image by deleting whatever we can when we're done. This will reduce the time it takes to upload to the registry and download via Shifter. Note however that the NVIDIA images, even the
base image, are quite large.
FROM nvidia/cuda:11.2.1-base-ubuntu20.04 ENV DEBIAN_FRONTEND noninteractive WORKDIR /opt RUN \ apt-get update && \ apt-get upgrade --yes && \ apt-get install --yes \ wget \ vim && \ apt-get clean all && \ rm -rf /var/lib/apt/lists/* #install miniconda #pin to python 3.8 for rapids compatibility ENV installer=Miniconda3-py38_4.9.2-Linux-x86_64.sh RUN wget https://repo.anaconda.com/miniconda/$installer && \ /bin/bash $installer -b -p /opt/miniconda3 && \ rm -rf $installer ENV PATH=/opt/miniconda3/bin:$PATH #use mamba to speed up package resolution RUN /opt/miniconda3/bin/conda install mamba -c conda-forge -y RUN \ /opt/miniconda3/bin/mamba install \ dask-cuda \ dask-cudf \ ipykernel \ matplotlib \ seaborn \ -c rapidsai-nightly -c nvidia -c conda-forge -c defaults -y && \ /opt/miniconda3/bin/mamba clean -a -y
If you have questions about any of these examples or about how to use Python in Shifter, we encourage you to contact NERSC's online help desk.