Jupyter¶
Jupyter is an essential component of NERSC's data ecosystem. Use Jupyter at NERSC to:
- Perform exploratory data analytics and visualization of data stored on the NERSC Global File System (NGF) or in databases at NERSC,
- Guide machine learning through distributed training, hyperparameter optimization, model validation, prediction, and inference,
- Manage workflows involving complex simulations and data analytics through the Cori batch queue,
- ... or do other things we haven't thought of yet.
Jupyter is a flexible, popular literate-computing web application for creating notebooks containing code, equations, visualization, and text. Notebooks are documents that contain both computer code and rich text elements (paragraphs, equations, figures, widgets, links). They are human-readable documents containing analysis descriptions and results but are also executable data analytics artifacts. Notebooks are associated with kernels, processes that actually execute code. Notebooks can be shared or converted into static HTML documents. They are a powerful tool for reproducible research and teaching.
JupyterHub¶
JupyterHub provides a multi-user hub for spawning, managing, and proxying multiple instances of single-user Jupyter notebook servers. At NERSC, you authenticate to the JupyterHub instance we manage using your NERSC credentials and one-time password. Here is a link to NERSC's JupyterHub service: https://jupyter.nersc.gov/
When you log into JupyterHub at NERSC, you will see a console or "home" page with some buttons. These buttons allow you to manage notebook servers running on Cori. Which notebook server should you use? It depends:
- Perlmutter (Experimental! Things here could change quickly with little notice!)
- Shared CPU Node
- Spawns Jupyter notebooks on Perlmutter login nodes (the label for this may change in the future)
- Access to NERSC Global Filesystem (NGF) and Perlmutter
$SCRATCH
platform storage - Notebooks can submit jobs to Perlmutter batch queues
- Subject to CPU and memory limits since it is a shared resource.
- Exclusive CPU Node
- Spawns your Jupyter notebook on one of Perlmutter's CPU nodes, allocating you the entire node
- Provides NGF and Perlmutter
$SCRATCH
access - Notebook is running in a single-node batch job with a time limit of 6 hours
- Usage charged to your default project (when charging begins in AY22)
- Exclusive GPU Node
- Spawns your Jupyter notebook on one of Perlmutter's GPU nodes, allocating you all 4 GPUs
- Provides NGF and Perlmutter
$SCRATCH
access - Notebook is running in a single-node batch job with a time limit of 6 hours
- Usage charged to your default GPU project (when charging begins in AY22)
- Configurable GPU jobs
- Like exclusive, but allows you exclusive access to 4 GPU nodes (16 GPUs)
- Job options are configurable through a form (charge account, time limit, reservation, etc)
- Shared CPU Node
- Cori
- Shared CPU Node
- Spawns Jupyter notebooks on special-purpose large-memory nodes of Cori (cori13,14,16,19)
- Access to NGF and Cori
$SCRATCH
platform storage - Default Python software environment is the same as one of the modules found on Cori
- Notebooks can submit jobs to Cori batch queues
- Subject to CPU and memory limits since it is a shared resource.
- Shared CPU Node
Using Jupyter at NERSC for Events¶
Jupyter at NERSC can be used for demos, tutorials, or workshops. You can even use training accounts with Jupyter at NERSC. If you plan to use Jupyter in this way, we ask that you observe the following guidelines:
- If 20 people or less at your event will be logging into jupyter.nersc.gov, there's no need to let us know ahead of time. We should be able to handle that level of increased load without any issues. Just be sure you don't schedule your event on a day when there is scheduled maintenance.
- For events where more than 20 people are logging in, please send us a heads up at least 1 month in advance via ticket. We've been able to absorb events of 50-100 people without any issues but we still want to know about your event. This lets us keep an eye on things while your event is going and hopefully keep things going smoothly.
- In either case please let us know if you have any special requirements or would like to do something more experimental. That is likely to incur a need for more lead time, but we're willing to work with you if there aren't already similar events coming up. For this case, please contact us at least 2 months in advance via ticket.
These are not hard and fast rules, but we're more likely to be able to help if we have advanced notice.
Conda Environments as Kernels¶
The default NERSC Python
kernel loads the default Python module on whatever system is being used. This behavior is new as of AY 2022, and makes the default kernel behavior more consistent with the command-line environment.
You can use one of our default Python, Julia, or R kernels. You also can use the following procedure to enable a custom kernel based on your own Conda environment. Let's start by assuming you are a user with username user
who wants to create a Conda environment on Cori and use it from Jupyter.
cori$ module load python
cori$ conda create -n myenv python=3.9 ipykernel <further-packages-to-install>
<... installation messages ...>
cori$ source activate myenv
cori$ python -m ipykernel install --user --name myenv --display-name MyEnv
Installed kernelspec myenv in /global/u1/u/user/.local/share/jupyter/kernels/myenv
cori$
Be sure to specify what version of Python interpreter you want installed. This will create and install a JSON file called a "kernel spec" in kernel.json
at the path described in the install command output.
{
"argv": [
"/global/homes/u/user/.conda/envs/myenv/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "MyEnv",
"language": "python"
}
Customizing Kernels¶
Here is an example kernel spec where the user needs other executables from a custom PATH
and shared libraries in LD_LIBRARY_PATH
. These are just included in an env
dictionary:
{
"argv": [
"/global/homes/u/user/.conda/envs/myenv/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "MyEnv",
"language": "python",
"env": {
"PATH":
"/global/homes/u/user/other/bin:/usr/local/bin:/usr/bin:/bin",
"LD_LIBRARY_PATH":
"/global/cfs/cdirs/myproject/lib:/global/homes/u/user/lib"
}
}
Note however that these environment variables do not prepend or append to existing PATH
or LD_LIBRARY_PATH
settings. To use them you probably have to copy your entire path or library path, which is quite inconvenient. Prepending a helper shell script, described next, is a more flexible choice.
Customizing Kernels with a Helper Shell Script¶
Instead you can use this trick that takes advantage of a helper shell script. Navigate to the kernel.json
file for the given environment. Prepend one additional argument into the argv
list as shown:
{
"argv": [
"{resource_dir}/kernel-helper.sh",
"python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "Custom Env",
"language": "python"
}
Then, create a kernel-helper.sh
script in the same directory as where the kernel.json
file is found. The resource_dir
variable is a convenient way to tell Jupyter to substitute in the path to that directory. The kernel-helper.sh
script should be made executable (chmod u+x kernel-helper.sh
).
As an example, here is a kernel helper script that makes that work with the example
module (not a real module). Use this helper in conjunction with the above kernel.json
:
#!/bin/bash
module load example
module load python
exec "$@"
You can put anything you want to configure your environment in the helper script. This can include environment variables, module loads, or conda environment activations. Just make sure it ends with the exec
line.
Shifter Kernels on Jupyter¶
Shifter works with Cori notebook servers. To make use of it, create a kernel spec and edit it to run shifter
. The path to Python in your image should be used as the executable, and the kernel spec should be placed at ~/.local/share/jupyter/kernels/<my-shifter-kernel>/kernel.json
(you do not need to create a Conda environment for this). Note that you must install ipykernel
in your container.
Here's an example of how to set up the kernel spec:
{
"argv": [
"shifter",
"--image=continuumio/anaconda3:latest",
"/opt/conda/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "my-shifter-kernel",
"language": "python"
}
Spark on Jupyter¶
You can run small instances ( < 4 cores) of Spark on Cori with Jupyter. You can even do it using Shifter too. Create the following kernel spec (you'll need to make the $SCRATCH/tmpfiles
, $SCRATCH/spark/event_logs
directories first):
{
"display_name": "shifter pyspark",
"language": "python",
"argv": [
"shifter",
"--image=nersc/spark-2.3.0:v1",
"/root/anaconda3/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/usr/local/bin/spark-2.3.0/",
"PYSPARK_SUBMIT_ARGS": "--master local[1] pyspark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=file:///global/cscratch1/sd/<your_dir>/spark/event_log --conf spark.history.fs.logDirectory=file:///global/cscratch1/sd/<your_dir>/spark/event_logs pyspark-shell",
"PYTHONSTARTUP": "/usr/local/bin/spark-2.3.0/python/pyspark/shell.py",
"PYTHONPATH": "/usr/local/bin/spark-2.3.0/python/lib/py4j-0.10.6-src.zip:/usr/local/bin/spark-2.3.0/python/",
"PYSPARK_PYTHON": "/root/anaconda3/bin/python",
"PYSPARK_DRIVER_PYTHON": "ipython3",
"JAVA_HOME":"/usr"
}
}
Using ipympl in Your Kernels (Matplotlib Jupyter Integration)¶
Leveraging the Jupyter interactive widgets framework, ipympl
enables the interactive features of Matplotlib in the Jupyter notebook and in JupyterLab. Getting this to work in a Jupyter kernel using JupyterLab at NERSC currently requires that users install the same version of ipympl
in their kernel as is installed in JupyterLab by NERSC. This is a known issue and the ipympl
developers are working a solution.
For now, users need to know what version of ipympl
they need to install into their kernels. Starting from the beginning of the 2022 allocation year, the version of ipympl
installed into JupyterLab will match the version installed in the default Python module on either Cori or Perlmutter. This way users can easily find out what version of ipympl
they should use by running module load python
followed by conda list ipympl
at the command line. You should see output like the following:
# packages in environment at ...:
#
# Name Version Build Channel
ipympl 0.8.6 pyhd8ed1ab_0 conda-forge
You can use the conda
tool to install the matching version from the conda-forge
channel into your environment:
conda install -c conda-forge ipympl=0.8.6
Debugging Jupyter Problems¶
Logs can be very helpful when it comes to debugging issues with Jupyter, your custom kernels, or your Python environment. One of the first things we do when investigating Jupyter tickets is consult your log files.
- Logs from shared node Jupyter processes appear in
.jupyter-$NERSC_HOST.log
in your$HOME
directory.- Log file names are parameterized by system name to prevent collisions from multiple systems where Jupyter can run.
- At start, if you already have one of these log files in place and it is big (1 GB) then it is deleted and a new one is started.
- Successive Jupyter sessions append to this log file, to help with debugging and keep some history.
- Logs from Jupyter processes where the batch system is engaged on the back-end (GPUs, exclusive CPU nodes, etc.) are written to
slurm-*.log
files that appear in your$HOME
directory while the job is running. These can be deleted once the job is complete.
Help Us Help You
You might save yourself a lot of time if you look at this log file yourself before opening a ticket. In fact, if you see anything that you think might be particularly important, you can highlight that in a ticket.
And as always, be sure to be as specific as possible in tickets you file about Jupyter. For example, if you have an issue with a particular kernel or Conda environment, let us know which one it is.
Spawn failed: HTTP 507, Insufficient Storage¶
If you see this error it is because your $HOME
directory is over quota. When you are over quota, Jupyter cannot write or update files that it needs to manage in your $HOME
directory, and Jupyter will not function.
To fix this, you need to:
- Log in using
ssh
or use NoMachine / NX. - Run the command
showquota
.- This will show you how much you are over quota.
- You may be over on space, inodes, or both (probably space).
- Remove or migrate some files from your
$HOME
directory:- Conda environments can eat up your quota. Consider these tips.
- If you have a lot of data in
$HOME
, consider moving it to the Community File System or archiving it to HPSS.
- Use
showquota
to check if you're back under quota. - Try starting your Jupyter notebook server again from the Hub.
Unexpected error while saving and disk I/O error in Jupyter¶
If you try to save or create a new notebook and you see an error like Unexpected error while saving file: <path-to-notebook> disk I/O error
you may just be over quota. Use a terminal tab or ssh into Cori and run showquota
to verify. Then you can delete, or archive/move and then delete data to make enough space for your notebook. If the error arises while trying to save a notebook on /cfs/
then use the cfsquota
tool to see if you are over quota there.
Experimental Features¶
From time to time NERSC deploys experimental Jupyter features for testing by users. These may be new packages or extensions developed by others that our users may find useful. Sometimes they are the result of, or in the process of, development by NERSC staff and collaborators. These experimental features may work great, they may need work, or they may not be viable, or we find users don't actually use them. To help users understand what experimental features are in development, available for testing, how to opt in/out, we have created a page on experimental features at the NERSC dev system documentation site.