Skip to content

Hyperparameter optimization

Hyperparameter optimization (HPO) is for tuning the hyperparameters of your machine learning model, e.g., the learning rate, filter sizes, etc. There are several popular algorithms used for HPO including grid search, random search, Bayesian optimization, and genetic optimization. Similarly, there are several libraries and tools implementing these algorithms, each having their own tradeoffs in usability, flexibility, and feature support.

On this page we will collect recommendations and examples for running distributed HPO tasks on our HPC systems.

Weights and Biases

W&B is a great tool for experiment logging and visualization, in addition to HPO. The W&B webpage has documentation and examples: https://wandb.ai/

Additionally, we provide a PyTorch codebase that can serve as a template for logging and HPO with W&B for your deep learning applications (including multi-GPU distributed data parallel applications). See the template here: W&B template for NERSC

KerasTuner

An easy-to-use tool if you're using Keras: https://keras.io/keras_tuner/

RayTune

Tune is an open-source Python library for experiment execution and hyperparameter tuning at any scale. RayTune:

  • supports any ML framework
  • implements state of the art HPO strategies
  • natively integrates with optimization libraries (HyperOpt, BayesianOpt, and Facebook Ax)
  • integrates well with Slurm
  • handles trial micro scheduling on multi-GPU-node resources (no GPU binding boilerplate needed)

We provide RayTune in all of our GPU TensorFlow and PyTorch modules and Shifter images. You can also use our slurm-ray-cluster scripts for running multi-GPU node HPO campaigns, and the repo includes a "hello world" MNIST example.

HYPPO

A new tool built by some LBNL folks which is tested on NERSC systems: https://hpo-uq.gitlab.io/

DeepHyper

DeepHyper is a Python package for distributed Hyperparameter Optimization, Neural Architecture Search and Uncertainty Quantification. It can interface with different backends to distribute computation such as threads, processes, Ray and MPI.

In case of issue contact Prasanna Balaprakash (pbalapra[at]anl[dot]gov) or directly open an issue on the Github repository.

A quick example of DeepHyper API:

def run(config: dict):
    return -config["x"]**2


# Necessary IF statement otherwise it will enter in a infinite loop
# when loading the 'run' function from a subprocess
if __name__ == "__main__":
    from deephyper.problem import HpProblem
    from deephyper.search.hps import CBO
    from deephyper.evaluator import Evaluator

    # define the variable you want to optimize
    problem = HpProblem()
    problem.add_hyperparameter((-10.0, 10.0), "x")

    # define the evaluator to distribute the computation
    evaluator = Evaluator.create(
        run,
        method="process",
        method_kwargs={
            "num_workers": 2,
        },
    )

    # define your search and execute it
    search = CBO(problem, evaluator)

    results = search.search(max_evals=100)

which outputs a Pandas DataFrame where the best x is clearly near 0:

         p:x  job_id     objective  timestamp_submit  timestamp_gather
0  -7.744105       1 -5.997117e+01          0.011047          0.037649
1  -9.058254       2 -8.205196e+01          0.011054          0.056398
2  -1.959750       3 -3.840621e+00          0.049750          0.073166
3  -5.150553       4 -2.652819e+01          0.065681          0.089355
4  -6.697095       5 -4.485108e+01          0.082465          0.158050
..       ...     ...           ...               ...               ...
95 -0.034096      96 -1.162566e-03         26.479630         26.795639
96 -0.034204      97 -1.169901e-03         26.789255         27.155481
97 -0.037873      98 -1.434366e-03         27.148506         27.466934
98 -0.000073      99 -5.387088e-09         27.460253         27.774704
99  0.697162     100 -4.860350e-01         27.768153         28.142431