FAQ and Troubleshooting¶
shifter --help command can be very useful.
Users who build on non-x86 hardware may see an error like this:
shifter: /bin/bash: Exec format error
To fix this, users can consider trying a multi-arch build. Here is an example::
docker buildx create --use docker buildx build --platform linux/amd64,linux/arm64/v8 --push -t elvis/image:latest .
This example both builds the cross-platform image and pushes it to the registry. To verify that the build did work as intended, the user can check the image metadata in the registry (for example, dockerhub) to see if the image architecture is correct.
Failed to lookup Image¶
If you are trying to start many tasks at the same time with Shifter, this can create congestion on the image gateway.
If all the processes will use the same image, then you can avoid this by specifying the image in the batch submit script instead of on the command-line.
#SBATCH --image=myimage:latest shifter /path/to/app arg1 arg2
Using this format, the image will be looked up at submission time and cached as part of the job.
If your jobs needs to use multiple images during execution then the approach above will not be sufficient. A workaround is to specify the image by its ID which will avoid the lookup. Just specify the image as
id: followed by the id number which can be obtained with
shifterimg lookup. The image lookup should be done in advance to avoid the lookup occurring during the job.
# Done in advance... user:~> shifterimg lookup centos:8 76d24f3ba3317fa945743bb3746fbaf3a0b752f10b10376960de01da70685fbd # In the job... shifter --image=id:76d24f3ba3317fa945743bb3746fbaf3a0b752f10b10376960de01da70685fbd /bin/hostname
Invalid Volume Map¶
Sometimes volume mounting a directory will fail with
invalid volume map or with this error:
ERROR: unclean exit from bind-mount routine. /var/udiMount/tmp may still be mounted. BIND MOUNT FAILED from /var/udiMount/<full path to directory> to /var/udiMount/tmp FAILED to setup user-requested mounts. FAILED to setup image.
This can happen for different reasons but a common case has to do with the permissions of the directory being mounted. Let's take an example
shifter --volume /global/cfs/cdirs/myproj/a/b --image=myimage bash
In order for Shifter to allow the mount, it needs to be able to see up to the last path as user nobody. The easiest way to fix this is to use
setfacl to allow limited access to the directory. This needs to be done for the full path up to the final directory. For example:
setfacl -m u:nobody:x /global/cfs/cdirs/myproj/ setfacl -m u:nobody:x /global/cfs/cdirs/myproj/a
Note that only the owner of a directory can change the access controls, so you may need the project owner to fix some path elements.
GLIBC_2.25 not found¶
This error will typically contain the following line but other variations may appear.
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.25' not found (required by /opt/udiImage/modules/mpich/mpich-7.7.19/lib64/dep/libexpat.so.1)
By default Shifter automatically injects libraries to support running MPI and GPU support (where applicable). This can sometimes conflict with the contents of the image if the image uses an older version of GLIBC. If the application doesn't require MPI support you can try adding the flag
--module none to disable the injection.
elvis@nid00042:~> shifter --image=elvis/test:123 --module=none /bin/bash
If your application requires MPI support and you are running on Cori, you can use the flag
--module=mpich-cle6. This will load an older version of the MPI libraries that were built with an older GLIBC version. Currently NERSC doesn't not have a similar module version for Perlmutter.
elvis@nid00042:~> shifter --image=elvis/test:123 --module mpich-cle6 /bin/bash
If you have a Shifter question or problem, please open a ticket at help.nersc.gov.