Skip to content

Commit

Permalink
Intro to BO
Browse files Browse the repository at this point in the history
Summary: Basic doc introducing BO concepts like surrogate models, acquisition functions, etc.

Differential Revision: D69267374
  • Loading branch information
mpolson64 authored and facebook-github-bot committed Feb 19, 2025
1 parent 10c35f3 commit a08fd77
Show file tree
Hide file tree
Showing 5 changed files with 126 additions and 1 deletion.
Binary file added docs/assets/ei.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/gpei.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/surrogate.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
125 changes: 125 additions & 0 deletions docs/intro-to-bo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
id: intro-to-bo
title: Introduction to Bayesian Optimization
---

# Introduction to Bayesian Optimization

Bayesian optimization (BO) is a highly effective adaptive experimentation method
that excels at balancing exploration (learning how new parameterizations
perform) and exploitation (refining parameterizations previously observed to be
good). This method is the backbone of Ax's optimization.

BO has seen widespread use across a variety of domains. Notable examples include
its use in
[tuning the hyperparameters of AlphaGo](https://www.nature.com/articles/nature16961),
a landmark model that defeated world champions in the board game Go. In
materials science, researchers used BO to accelerate the curing process,
increase the overall strength, and reduce the CO2 emissions of
[concrete formulations](https://arxiv.org/abs/2310.18288), the most abundant
human-made material in history. In chemistry, researchers used it to
[discover 21 new, state-of-the-art molecules for tunable dye lasers](https://www.science.org/doi/10.1126/science.adk9227)
(frequently used in quantum physics research), including the world’s brightest
molecule, while only a dozen or so had been discovered over the course of
decades.

Ax relies on [BoTorch](https://botorch.org/) for its implementation of
state-of-the-art Bayesian optimization components.

## Bayesian Optimization

Bayesian optimization begins by building a smooth surrogate model of the
outcomes using a statistical model. This surrogate model can be used to make
predictions at unobserved parameterizations and quantify the uncertainty around
them. The predictions and the uncertainty estimates are combined to derive an
acquisition function, which quantifies the value of observing a particular
parameterization. By optimizing the acquisition function we find the best
candidate parameterizations which we are then able to evaluate. We iteratively
fit the surrogate model with newly observed data, optimize the acquisition
function to find the best configuration to observe, then fit a new surrogate
model with the newly observed outcomes. The entire process is adaptive in the
sense that the predictions and uncertainty estimates are updated as new
observations are made.

The strategy of relying on successive surrogate models to update knowledge of
the objective allows BO to strike a balance between the conflicting goals of
exploration (trying out parameterizations with high uncertainty in their
outcomes) and exploitation (converging on configurations that are likely to be
good). As a result, BO is able to find better configurations with fewer
evaluations than is generally possible with grid search or other global
optimization techniques. This makes it a good choice for applications where a
limited number of function evaluations can be made.

## Surrogate Models

Because the objective function is a black box process, we treat it as a random
function and place a prior over it. This prior captures beliefs about the
objective, and it is updated as data is observed to form the posterior.

This is typically done using a Gaussian process (GP), a probabilistic model that
defines a probability distribution over possible functions that fit a set of
points. Importantly for Bayesian Optimization, GPs can be used to map points in
input space (the parameters we wish to tune) to distributions in output space
(the objectives we wish to optimize).

In the one-dimensional example below, a surrogate model is fitted to five noisy
observations using GPs to predict the objective (solid line) and place
uncertainty estimates (proportional to the width of the shaded bands) over the
entire x-axis, which represents the range of possible parameter values.
Importantly, the model is able to predict the outcome and quantify the
uncertainty of configurations that have not yet been tested. Intuitively, the
uncertainty bands are tight in regions that are well-explored and become wider
as we move away from them.

![GP surrogate model](assets/surrogate.png)

## Acquisition Functions

The acquisition function is a mathematical function that quantifies the utility
of observing a given point in the domain. Ax supports the most commonly used
acquisition functions in BO, including:

- **Expected Improvement (EI)**, which captures the expected value of a point
above the current best value.
- **Probability of Improvement (PI)**, which captures the probability of a point
producing an observation better than the current best value.
- **Upper Confidence Bound (UCB)**, which sums the predicted mean and standard
deviation.

Each of these acquisition functions will lead to different behavior during the
optimization. Expected improvement is a popular acquisition function owing to
its natural tendency to both explore regions of high uncertainty and exploit
regions known to be good, an analytic form that is easy to compute, and overall
good practical performance. As the name suggests, it rewards evaluation of the
objective $$f$$ based on the expected improvement relative to the current best.
If $$f^* = \max_i y_i$$ is the current best observed outcome and our goal is to
maximize $f$, then EI is defined as the following:

$$
\text{EI}(x) = \mathbb{E}\bigl[\max(f(x) - f^*, 0)\bigr]
$$

A visualization of the expected improvement based on the surrogate model
predictions is shown below, where the next suggestion is where the expected
improvement is at its maximum.

![Expected Improvement (EI) acquisition function](assets/ei.png)

Once a new highest EI is selected and evaluated, the surrogate model is
retrained and a new suggestion is made. This process continues in a loop until a
stopping condition set by the user is reached.

![Full Bayesian optimization loop](assets/gpei.gif)

Using an acquisition function like EI to sample new points initially promotes
quick exploration because its values, like the uncertainty estimates, are higher
in unexplored regions. Once the parameter space is adequately explored, EI
naturally narrows in on locations where there is a high likelihood of a good
objective value.

While the combination of a Gaussian process surrogate model and the expected
improvement acquisition function is shown above, different combinations of
surrogate models and acquisition functions can be used. Different surrogates,
either GPs with different behaviors or entirely different probabilistic models,
or different acquisition functions present various tradeoffs in terms of
optimization performance, computational load, and more.
2 changes: 1 addition & 1 deletion website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ const tutorials = () => {

export default {
docs: {
Introduction: ['why-ax', 'intro-to-ae'],
Introduction: ['why-ax', 'intro-to-ae', 'intro-to-bo'],
},
tutorials: tutorials(),
};

0 comments on commit a08fd77

Please sign in to comment.