Skip to content

Commit f5441f1

Browse files
mpolson64facebook-github-bot
authored andcommitted
Intro to BO
Summary: Basic doc introducing BO concepts like surrogate models, acquisition functions, etc. Differential Revision: D69267374
1 parent 8c1a463 commit f5441f1

File tree

5 files changed

+129
-1
lines changed

5 files changed

+129
-1
lines changed

docs/assets/ei.png

226 KB
Loading

docs/assets/gpei.gif

203 KB
Loading

docs/assets/surrogate.png

181 KB
Loading

docs/intro-to-bo.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
---
2+
id: intro-to-bo
3+
title: Introduction to Bayesian Optimization
4+
---
5+
6+
# Introduction to Bayesian Optimization
7+
8+
Bayesian optimization (BO) is a highly effective adaptive experimentation method
9+
that excels at balancing exploration (learning how new parameterizations
10+
perform) and exploitation (refining parameterizations previously observed to be
11+
good). This method is the foundation of Ax's optimization.
12+
13+
BO has seen widespread use across a variety of domains. Notable examples include
14+
its use in
15+
[tuning the hyperparameters of AlphaGo](https://www.nature.com/articles/nature16961),
16+
a landmark model that defeated world champions in the board game Go. In
17+
materials science, researchers used BO to accelerate the curing process,
18+
increase the overall strength, and reduce the CO2 emissions of
19+
[concrete formulations](https://arxiv.org/abs/2310.18288), the most abundant
20+
human-made material in history. In chemistry, researchers used it to
21+
[discover 21 new, state-of-the-art molecules for tunable dye lasers](https://www.science.org/doi/10.1126/science.adk9227)
22+
(frequently used in quantum physics research), including the world’s brightest
23+
molecule, while only a dozen or so had been discovered over the course of
24+
decades.
25+
26+
Ax relies on [BoTorch](https://botorch.org/) for its implementation of
27+
state-of-the-art Bayesian optimization components.
28+
29+
## Bayesian Optimization
30+
31+
Bayesian optimization begins by building a smooth surrogate model of the
32+
outcomes using a statistical model. This surrogate model makes predictions at
33+
unobserved parameterizations and estimate the uncertainty around them. The
34+
predictions and the uncertainty estimates are combined to derive an acquisition
35+
function, which quantifies the value of observing a particular parameterization.
36+
By optimizing the acquisition function we identify the best candidate
37+
parameterizations for evaluation. In an iterative process, we fit the surrogate
38+
model with newly observed data, optimize the acquisition function to identify
39+
the best configuration to observe, then fit a new surrogate model with the newly
40+
observed outcomes. The entire process is adaptive where the predictions and
41+
uncertainty estimates are updated as new observations are made.
42+
43+
The strategy of relying on successive surrogate models to update knowledge of
44+
the objective allows BO to strike a balance between the conflicting goals of
45+
exploration (trying out parameterizations with high uncertainty in their
46+
outcomes) and exploitation (converging on configurations that are likely to be
47+
good). As a result, BO is able to find better configurations with fewer
48+
evaluations than is generally possible with grid search or other global
49+
optimization techniques. Therefore, leveraging BO as is done in Ax, is
50+
particularly impactful for applications where the evaluation process is
51+
expensive, allowing for only a limited number of evaluations
52+
53+
## Surrogate Models
54+
55+
Because the objective function is a black-box process, we treat it as a random
56+
function and place a prior over it. This prior captures beliefs about the
57+
objective, and it is updated as data is observed to form the posterior.
58+
59+
This is typically done using a Gaussian process (GP), a probabilistic model that
60+
defines a probability distribution over possible functions that fit a set of
61+
points. Importantly for Bayesian Optimization, GPs can be used to map points in
62+
input space (the parameters we wish to tune) to distributions in output space
63+
(the objectives we wish to optimize).
64+
65+
In the one-dimensional example below, a surrogate model is fit to five noisy
66+
observations using a GP to predict the objective, depicted by the solid line,
67+
and uncertainty estimates, illustrated by the width of the shaded bands. This
68+
objective is predicted for the entire range of possible parameter values,
69+
corresponding to the full x-axis. Importantly, the model is able to predict the
70+
outcome and quantify the uncertainty of configurations that have not yet been
71+
tested. Intuitively, the uncertainty bands are tight in regions that are
72+
well-explored and become wider as we move away from them.
73+
74+
![GP surrogate model](assets/surrogate.png)
75+
76+
## Acquisition Functions
77+
78+
The acquisition function is a mathematical function that quantifies the utility
79+
of observing a given point in the domain. Ax supports the most commonly used
80+
acquisition functions in BO, including:
81+
82+
- **Expected Improvement (EI)**, which captures the expected value of a point
83+
above the current best value.
84+
- **Probability of Improvement (PI)**, which captures the probability of a point
85+
producing an observation better than the current best value.
86+
- **Upper Confidence Bound (UCB)**, which sums the predicted mean and standard
87+
deviation.
88+
89+
Each of these acquisition functions will lead to different behavior during the
90+
optimization. Additionally, many of these acquisition functions have been
91+
extended to perform well in constrained, noisy, multi-objective, and/or batched
92+
settings.
93+
94+
Expected Improvement is a popular acquisition function owing to well balanced
95+
exploitation vs exploration, a straighforward analytic form, and overall good
96+
practical performance. As the name suggests, it rewards evaluation of the
97+
objective $$f$$ based on the expected improvement relative to the current best.
98+
If $$f^* = \max_i y_i$$ is the current best observed outcome and our goal is to
99+
maximize $f$, then EI is defined as the following:
100+
101+
$$
102+
\text{EI}(x) = \mathbb{E}\bigl[\max(f(x) - f^*, 0)\bigr]
103+
$$
104+
105+
A visualization of the expected improvement based on the surrogate model
106+
predictions is shown below, where the next suggestion is where the expected
107+
improvement is at its maximum.
108+
109+
![Expected Improvement (EI) acquisition function](assets/ei.png)
110+
111+
Once a new highest EI is selected and evaluated, the surrogate model is
112+
retrained and a new suggestion is made. As described above, this process
113+
continues iteratively until a stopping condition, set by the user, is reached.
114+
115+
![Full Bayesian optimization loop](assets/gpei.gif)
116+
117+
Using an acquisition function like EI to sample new points initially promotes
118+
quick exploration because the expected values, informed by the uncertainty
119+
estimates, are higher in unexplored regions. Once the parameter space is
120+
adequately explored, EI naturally narrows focuses on regions where there is a
121+
high likelihood of a good objective value (ie exploitation).
122+
123+
While the combination of a Gaussian process surrogate model and the expected
124+
improvement acquisition function is shown above, different combinations of
125+
surrogate models and acquisition functions can be used. Different surrogates,
126+
either differently configured GPs or entirely different probabilistic models, or
127+
different acquisition functions present various tradeoffs in terms of
128+
optimization performance, computational load, and more.

website/sidebars.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ const tutorials = () => {
4545

4646
export default {
4747
docs: {
48-
Introduction: ['why-ax', 'intro-to-ae'],
48+
Introduction: ['why-ax', 'intro-to-ae', 'intro-to-bo'],
4949
},
5050
tutorials: tutorials(),
5151
};

0 commit comments

Comments
 (0)