Skip to content

Jupyter

Jupyter is an essential component of NERSC's data ecosystem. Use Jupyter at NERSC to:

  • Perform exploratory data analytics and visualization of data stored on the NERSC Global File System (NGF) or in databases at NERSC,
  • Guide machine learning through distributed training, hyperparameter optimization, model validation, prediction, and inference,
  • Manage workflows involving complex simulations and data analytics through the Cori batch queue,
  • ... or do other things we haven't thought of yet.

Jupyter is a flexible, popular literate-computing web application for creating notebooks containing code, equations, visualization, and text. Notebooks are documents that contain both computer code and rich text elements (paragraphs, equations, figures, widgets, links). They are human-readable documents containing analysis descriptions and results but are also executable data analytics artifacts. Notebooks are associated with kernels, processes that actually execute code. Notebooks can be shared or converted into static HTML documents. They are a powerful tool for reproducible research and teaching.

JupyterHub

JupyterHub provides a multi-user hub for spawning, managing, and proxying multiple instances of single-user Jupyter notebook servers. At NERSC, you authenticate to the JupyterHub instance we manage using your NERSC credentials and one-time password. Here is a link to NERSC's JupyterHub service: https://jupyter.nersc.gov/

When you log into JupyterHub at NERSC, you will see a console or "home" page with some buttons. These buttons allow you to manage notebook servers running on Cori or in Spin. Which notebook server should you use? It depends:

  • Cori
    • Spawns Jupyter notebooks on special-purpose large-memory nodes of Cori (cori13, cori14, cori19)
    • Exposes GPFS and Cori $SCRATCH
    • Default Python software environment is the same as one of the modules found on Cori
    • Notebooks can submit jobs to Cori batch queues via magic commands
  • Spin
    • Runs as a Spin service and is thus external to NERSC's Cray systems
    • Notebooks spawned by this service have access to GPFS (e.g. /global/cfs, $HOME)
    • Python software environments and kernels run in the Spin service, not on Cori

We view the Cori notebook service as the production service users should normally use. The Spin notebook service is a handy failover alternative if Cori login nodes are down. Generally users should run a notebook service on Cori, unless there's a reason to fail over to Spin.

Tip

The nodes used by https://jupyter.nersc.gov/ are a shared resource, so please be careful not to use too many CPUs or too much memory. Treat them like regular login nodes.

Using Jupyter at NERSC for Events

Jupyter at NERSC can be used for demos, tutorials, or workshops. You can even use training accounts with Jupyter at NERSC. If you plan to use Jupyter in this way, we ask that you observe the following guidelines:

  • If 20 people or less at your event will be logging into jupyter.nersc.gov, there's no need to let us know ahead of time. We should be able to handle that level of increased load without any issues. Just be sure you don't schedule your event on a day when there is scheduled maintenance.
  • For events where more than 20 people are logging in, please send us a heads up at least 1 month in advance via ticket. We've been able to absorb events of 50-100 people without any issues but we still want to know about your event. This lets us keep an eye on things while your event is going and hopefully keep things going smoothly.
  • In either case please let us know if you have any special requirements or would like to do something more experimental. That is likely to incur a need for more lead time, but we're willing to work with you if there aren't already similar events coming up. For this case, please contact us at least 2 months in advance via ticket.

These are not hard and fast rules, but we're more likely to be able to help if we have advanced notice.

JupyterLab

JupyterLab is the next generation of Jupyter. It provides a way to use notebooks, text editors, terminals, and custom components together. Documents and activities can be arranged in the interface side-by-side, and integrate with each other.

JupyterLab is new but ready for use now. With release 0.33 we have made JupyterLab the default interface to Jupyter on both hubs. If you prefer to work with the "classic" interface select "Launch Classic Notebook" from the JupyterLab Help menu. Alternatively you can also change the URL from /lab to /tree.

Conda Environments as Kernels

You can use one of our default Python 2, Python 3, or R kernels. If you have a Conda environment, depending on how it is installed, it may just show up in the list of kernels you can use. If not, use the following procedure to enable a custom kernel based on a Conda environment. Let's start by assuming you are a user with username user who wants to create a Conda environment on Cori and use it from Jupyter.

cori$ module load python
cori$ conda create -n myenv python=3.7 ipykernel <further-packages-to-install>
<... installation messages ...>
cori$ source activate myenv
cori$ python -m ipykernel install --user --name myenv --display-name MyEnv
Installed kernelspec myenv in /global/u1/u/user/.local/share/jupyter/kernels/myenv
cori$

Be sure to specify what version of Python interpreter you want installed. This will create and install a JSON file called a "kernel spec" in kernel.json at the path described in the install command output.

{
    "argv": [
        "/global/homes/u/user/.conda/envs/myenv/bin/python",
        "-m",
        "ipykernel_launcher",
        "-f",
        "{connection_file}"
    ],
    "display_name": "MyEnv",
    "language": "python"
}

Tip

If you previously relied on nb_conda_kernels to automatically discover Conda environments and make kernels from them, please note we have retired this plug-in. This led to duplication of kernel entries and confusion for many users in their list of kernels. Please use the above procedure to explicitly create your Jupyter kernels from a Conda environment.

Customizing Kernels

Here is an example kernel spec where the user needs other executables from a custom PATH and shared libraries in LD_LIBRARY_PATH. These are just included in an env dictionary:

{
    "argv": [
        "/global/homes/u/user/.conda/envs/myenv/bin/python",
        "-m",
        "ipykernel_launcher",
        "-f",
        "{connection_file}"
    ],
    "display_name": "MyEnv",
    "language": "python",
    "env": {
        "PATH":
            "/global/homes/u/user/other/bin:/usr/local/bin:/usr/bin:/bin",
        "LD_LIBRARY_PATH":
            "/global/cfs/cdirs/myproject/lib:/global/homes/u/user/lib"
    }
}

Customizing Kernels with a Helper Shell Script

Note however that these environment variables do not prepend or append to existing PATH or LD_LIBRARY_PATH settings. To use them you probably have to copy your entire path or library path, which is quite inconvenient. Instead you can use this trick that takes advantage of a helper shell script:

{
    "argv": [
        "{resource_dir}/kernel-helper.sh",
        "-f",
        "{connection_file}"
    ],
    "display_name": "Custom Env",
    "language": "python"
}

Create the kernel-helper.sh script in the same directory as where the kernel.json file is found. The resource_dir variable is a convenient way to tell Jupyter to substitute in the path to that directory. The kernel-helper.sh script should be made executable (chmod u+x kernel-helper.sh).

Sometimes users want LaTeX fonts for Matplotlib. As an example, here's a kernel helper script that makes that work with the texlive module. Use this helper in conjunction with the above kernel.json:

#!/bin/bash -l
module load texlive
module load python
exec python -m ipykernel_launcher "$@"

You can put anything you want to configure your environment in the helper script. This can include environment variables, module loads, or conda environment activations. Just make sure it ends with the ipykernel_launcher command.

Shifter Kernels on Jupyter

Shifter works with Cori notebook servers, but not Spin notebook servers. To make use of it, create a kernel spec and edit it to run shifter. The path to Python in your image should be used as the executable, and the kernel spec should be placed at ~/.local/share/jupyter/kernels/<my-shifter-kernel>/kernel.json (you do not need to create a Conda environment for this). Note that you must install ipykernel in your container.

Here's an example of how to set up the kernel spec:

{
    "argv": [
        "shifter",
        "--image=continuumio/anaconda3:latest",
        "/opt/conda/bin/python",
        "-m",
        "ipykernel_launcher",
        "-f",
        "{connection_file}"
    ],
    "display_name": "my-shifter-kernel",
    "language": "python"
}

Spark on Jupyter

You can run small instances ( &lt 4 cores) of Spark on Cori with Jupyter. You can even do it using Shifter too. Create the following kernel spec (you'll need to make the $SCRATCH/tmpfiles, $SCRATCH/spark/event_logs directories first):

{
    "display_name": "shifter pyspark",
    "language": "python",
    "argv": [
        "shifter",
        "--image=nersc/spark-2.3.0:v1",
        "--volume=\"/global/cscratch1/sd/<your_dir>/tmpfiles:/tmp:perNodeCache=size=200G\"",
        "/root/anaconda3/bin/python",
        "-m",
        "ipykernel",
        "-f",
        "{connection_file}"
    ],
    "env": {
        "SPARK_HOME": "/usr/local/bin/spark-2.3.0/",
        "PYSPARK_SUBMIT_ARGS": "--master local[1] pyspark-shell
            --conf spark.eventLog.enabled=true
            --conf spark.eventLog.dir=file:///global/cscratch1/sd/<your_dir>/spark/event_logs
            --conf spark.history.fs.logDirectory=file:///global/cscratch1/sd/<your_dir>/spark/event_logs pyspark-shell",
        "PYTHONSTARTUP": "/usr/local/bin/spark-2.3.0/python/pyspark/shell.py",
        "PYTHONPATH": "/usr/local/bin/spark-2.3.0/python/lib/py4j-0.10.6-src.zip:/usr/local/bin/spark-2.3.0/python/",
        "PYSPARK_PYTHON": "/root/anaconda3/bin/python",
        "PYSPARK_DRIVER_PYTHON": "ipython3",
        "JAVA_HOME":"/usr"
    }
}

Debugging Jupyter Problems

At NERSC, users launch Jupyter notebooks after authenticating to JupyterHub. Logs from a user's notebook process appear in a file called .jupyter.log in the user's $HOME directory. These logs can be very helpful when it comes to debugging issues with Jupyter, your custom kernels, or your Python environment. One of the first things we do when investigating Jupyter tickets is consult this log file.

Need more information in the log file? You can control how verbose the logging is by changing the value of c.Application.log_level in your Jupyter notebook config file. You may not have a Jupyter notebook config file created yet. You can create it by running

/usr/common/software/jupyter/19-11/bin/jupyter notebook --generate-config

Open the generated configuration file, uncomment c.Application.log_level and change the value to say, 0, for debug level information. The logger used is Python's standard logger object.

Help Us Help You

You might save yourself a lot of time if you look at this log file yourself before opening a ticket. In fact, if you see anything that you think might be particularly important, you can highlight that in a ticket.

And as always, be sure to be as specific as possible in tickets you file about Jupyter. For example, if you have an issue with a particular kernel or Conda environment, let us know which one it is.

Spawn failed: HTTP 507

HTTP 507 means "insufficient storage". If you see this error, the most likely cause is that you are over quota in your $HOME (which you can check with myquota) and so Jupyter is unable to create its startup files. Try removing some files from $HOME until you are a few hundred MB below quota, and then start the Jupyter server again.

Unexpected error while saving and disk I/O error in Jupyter

If you try to save or create a new notebook and you see an error like Unexpected error while saving file: <path-to-notebook> disk I/O error you may just be over quota. Use a terminal tab or ssh into Cori and run myquota to verify. Then you can delete, or archive/move and then delete data to make enough space for your notebook. If the error arises while trying to save a notebook on /cfs/ then use the cfsquota tool to see if you are over quota there.

Python 3.8 Users on GPFS File Systems may observe BlockingIOErrors

Starting in October 2020, users of Python and Jupyter at NERSC began observing BlockingIOError failures. Usually for Jupyter users this interrupts the file save or checkpointing operations. The issue has been identified and is being addressed by the vendor. Please read more about the issue at the Python FAQ.