Skip to content

Commit e3c9033

Browse files
mpolson64facebook-github-bot
authored andcommitted
Intro to BO
Summary: Basic doc introducing BO concepts like surrogate models, acquisition functions, etc. Differential Revision: D69267374
1 parent eef457f commit e3c9033

File tree

5 files changed

+130
-1
lines changed

5 files changed

+130
-1
lines changed

docs/assets/ei.png

226 KB
Loading

docs/assets/gpei.gif

203 KB
Loading

docs/assets/surrogate.png

181 KB
Loading

docs/intro-to-bo.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
id: intro-to-bo
3+
title: Introduction to Bayesian Optimization
4+
---
5+
6+
# Introduction to Bayesian Optimization
7+
8+
Bayesian optimization (BO) is a highly effective adaptive experimentation method
9+
that excels at balancing exploration (learning how new parameterizations
10+
perform) and exploitation (refining parameterizations previously observed to be
11+
good). This method is the foundation of Ax's optimization.
12+
13+
BO has seen widespread use across a variety of domains. Notable examples include
14+
its use in
15+
[tuning the hyperparameters of AlphaGo](https://www.nature.com/articles/nature16961),
16+
a landmark model that defeated world champions in the board game Go. In
17+
materials science, researchers used BO to accelerate the curing process,
18+
increase the overall strength, and reduce the CO2 emissions of
19+
[concrete formulations](https://arxiv.org/abs/2310.18288), the most abundant
20+
human-made material in history. In chemistry, researchers used it to
21+
[discover 21 new, state-of-the-art molecules for tunable dye lasers](https://www.science.org/doi/10.1126/science.adk9227)
22+
(frequently used in quantum physics research), including the world’s brightest
23+
molecule, while only a dozen or so had been discovered over the course of
24+
decades.
25+
26+
Ax relies on [BoTorch](https://botorch.org/) for its implementation of
27+
state-of-the-art Bayesian optimization components.
28+
29+
## Bayesian Optimization
30+
31+
Bayesian optimization begins by building a smooth surrogate model of the
32+
outcomes using a statistical model. This surrogate model makes predictions at
33+
unobserved parameterizations and estimate the uncertainty around them. The
34+
predictions and the uncertainty estimates are combined to derive an acquisition
35+
function, which quantifies the value of observing a particular parameterization.
36+
By optimizing the acquisition function we identify the best candidate
37+
parameterizations for evaluation. In an iterative process, we fit the surrogate
38+
model with newly observed data, optimize the acquisition function to identify
39+
the best configuration to observe, then fit a new surrogate model with the newly
40+
observed outcomes. The entire process is adaptive where the predictions and
41+
uncertainty estimates are updated as new observations are made.
42+
43+
The strategy of relying on successive surrogate models to update knowledge of
44+
the objective allows BO to strike a balance between the conflicting goals of
45+
exploration (trying out parameterizations with high uncertainty in their
46+
outcomes) and exploitation (converging on configurations that are likely to be
47+
good). As a result, BO is able to find better configurations with fewer
48+
evaluations than is generally possible with grid search or other global
49+
optimization techniques. Therefore, leveraging BO as is done in Ax, is
50+
particularly impactful for applications where the evaluation process is
51+
expensive, allowing for only a limited number of evaluations
52+
53+
## Surrogate Models
54+
55+
Because the objective function is a black-box process, we treat it as a random
56+
function and place a prior over it. This prior captures beliefs about the
57+
objective, and it is updated as data is observed to form the posterior.
58+
59+
This is typically done using a Gaussian process (GP), a probabilistic model that
60+
defines a probability distribution over possible functions that fit a set of
61+
points. Importantly for Bayesian Optimization, GPs can be used to map points in
62+
input space (the parameters we wish to tune) to distributions in output space
63+
(the objectives we wish to optimize).
64+
65+
In the one-dimensional example below, a surrogate model is fitted to five noisy
66+
observations using GPs to predict the objective (solid line) and place
67+
uncertainty estimates (proportional to the width of the shaded bands) over the
68+
entire x-axis, which represents the range of possible parameter values.
69+
Importantly, the model is able to predict the outcome and quantify the
70+
uncertainty of configurations that have not yet been tested. Intuitively, the
71+
uncertainty bands are tight in regions that are well-explored and become wider
72+
as we move away from them.
73+
74+
![GP surrogate model](assets/surrogate.png)
75+
76+
## Acquisition Functions
77+
78+
The acquisition function is a mathematical function that quantifies the utility
79+
of observing a given point in the domain. Ax supports the most commonly used
80+
acquisition functions in BO, including:
81+
82+
- **Expected Improvement (EI)**, which captures the expected value of a point
83+
above the current best value.
84+
- **Probability of Improvement (PI)**, which captures the probability of a point
85+
producing an observation better than the current best value.
86+
- **Upper Confidence Bound (UCB)**, which sums the predicted mean and standard
87+
deviation.
88+
89+
Each of these acquisition functions will lead to different behavior during the
90+
optimization. Additionally, many of these acquisition functions have been
91+
extended to perform well in constrained, noisy, multi-objective, and/or batched
92+
settings.
93+
94+
Expected improvement is a popular acquisition function owing to its natural
95+
tendency to both explore regions of high uncertainty and exploit regions known
96+
to be good, an analytic form that is easy to compute, and overall good practical
97+
performance. As the name suggests, it rewards evaluation of the objective $$f$$
98+
based on the expected improvement relative to the current best. If
99+
$$f^* = \max_i y_i$$ is the current best observed outcome and our goal is to
100+
maximize $f$, then EI is defined as the following:
101+
102+
$$
103+
\text{EI}(x) = \mathbb{E}\bigl[\max(f(x) - f^*, 0)\bigr]
104+
$$
105+
106+
A visualization of the expected improvement based on the surrogate model
107+
predictions is shown below, where the next suggestion is where the expected
108+
improvement is at its maximum.
109+
110+
![Expected Improvement (EI) acquisition function](assets/ei.png)
111+
112+
Once a new highest EI is selected and evaluated, the surrogate model is
113+
retrained and a new suggestion is made. This process continues in a loop until a
114+
stopping condition set by the user is reached.
115+
116+
![Full Bayesian optimization loop](assets/gpei.gif)
117+
118+
Using an acquisition function like EI to sample new points initially promotes
119+
quick exploration because its values, like the uncertainty estimates, are higher
120+
in unexplored regions. Once the parameter space is adequately explored, EI
121+
naturally narrows in on locations where there is a high likelihood of a good
122+
objective value.
123+
124+
While the combination of a Gaussian process surrogate model and the expected
125+
improvement acquisition function is shown above, different combinations of
126+
surrogate models and acquisition functions can be used. Different surrogates,
127+
either GPs with different behaviors or entirely different probabilistic models,
128+
or different acquisition functions present various tradeoffs in terms of
129+
optimization performance, computational load, and more.

website/sidebars.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ const tutorials = () => {
4545

4646
export default {
4747
docs: {
48-
Introduction: ['why-ax', 'intro-to-ae'],
48+
Introduction: ['why-ax', 'intro-to-ae', 'intro-to-bo'],
4949
},
5050
tutorials: tutorials(),
5151
};

0 commit comments

Comments
 (0)