Skip to content

Commit 9292caf

Browse files
authored
Fixed typo: Epanechikov -> Epanechnikov
Also, the definition of Epanechnikov kernel in (11.2.1) seems different from a typical definition that uses quadratic function.
1 parent 23d7a5a commit 9292caf

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

chapter_attention-mechanisms-and-transformers/attention-pooling.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ At their core, Nadaraya--Watson estimators rely on some similarity kernel $\alph
88
$$\begin{aligned}
99
\alpha(\mathbf{q}, \mathbf{k}) & = \exp\left(-\frac{1}{2} \|\mathbf{q} - \mathbf{k}\|^2 \right) && \textrm{Gaussian;} \\
1010
\alpha(\mathbf{q}, \mathbf{k}) & = 1 \textrm{ if } \|\mathbf{q} - \mathbf{k}\| \leq 1 && \textrm{Boxcar;} \\
11-
\alpha(\mathbf{q}, \mathbf{k}) & = \mathop{\mathrm{max}}\left(0, 1 - \|\mathbf{q} - \mathbf{k}\|\right) && \textrm{Epanechikov.}
11+
\alpha(\mathbf{q}, \mathbf{k}) & = \mathop{\mathrm{max}}\left(0, 1 - \|\mathbf{q} - \mathbf{k}\|\right) && \textrm{Epanechnikov.}
1212
\end{aligned}
1313
$$
1414

@@ -77,25 +77,25 @@ def constant(x):
7777
return 1.0 + 0 * x
7878
7979
if tab.selected('pytorch'):
80-
def epanechikov(x):
80+
def epanechnikov(x):
8181
return torch.max(1 - d2l.abs(x), torch.zeros_like(x))
8282
if tab.selected('mxnet'):
83-
def epanechikov(x):
83+
def epanechnikov(x):
8484
return np.maximum(1 - d2l.abs(x), 0)
8585
if tab.selected('tensorflow'):
86-
def epanechikov(x):
86+
def epanechnikov(x):
8787
return tf.maximum(1 - d2l.abs(x), 0)
8888
if tab.selected('jax'):
89-
def epanechikov(x):
89+
def epanechnikov(x):
9090
return jnp.maximum(1 - d2l.abs(x), 0)
9191
```
9292

9393
```{.python .input}
9494
%%tab all
9595
fig, axes = d2l.plt.subplots(1, 4, sharey=True, figsize=(12, 3))
9696
97-
kernels = (gaussian, boxcar, constant, epanechikov)
98-
names = ('Gaussian', 'Boxcar', 'Constant', 'Epanechikov')
97+
kernels = (gaussian, boxcar, constant, epanechnikov)
98+
names = ('Gaussian', 'Boxcar', 'Constant', 'Epanechnikov')
9999
x = d2l.arange(-2.5, 2.5, 0.1)
100100
for kernel, name, ax in zip(kernels, names, axes):
101101
if tab.selected('pytorch', 'mxnet', 'tensorflow'):
@@ -191,14 +191,14 @@ def plot(x_train, y_train, x_val, y_val, kernels, names, attention=False):
191191
plot(x_train, y_train, x_val, y_val, kernels, names)
192192
```
193193

194-
The first thing that stands out is that all three nontrivial kernels (Gaussian, Boxcar, and Epanechikov) produce fairly workable estimates that are not too far from the true function. Only the constant kernel that leads to the trivial estimate $f(x) = \frac{1}{n} \sum_i y_i$ produces a rather unrealistic result. Let's inspect the attention weighting a bit more closely:
194+
The first thing that stands out is that all three nontrivial kernels (Gaussian, Boxcar, and Epanechnikov) produce fairly workable estimates that are not too far from the true function. Only the constant kernel that leads to the trivial estimate $f(x) = \frac{1}{n} \sum_i y_i$ produces a rather unrealistic result. Let's inspect the attention weighting a bit more closely:
195195

196196
```{.python .input}
197197
%%tab all
198198
plot(x_train, y_train, x_val, y_val, kernels, names, attention=True)
199199
```
200200

201-
The visualization clearly shows why the estimates for Gaussian, Boxcar, and Epanechikov are very similar: after all, they are derived from very similar attention weights, despite the different functional form of the kernel. This raises the question as to whether this is always the case.
201+
The visualization clearly shows why the estimates for Gaussian, Boxcar, and Epanechnikov are very similar: after all, they are derived from very similar attention weights, despite the different functional form of the kernel. This raises the question as to whether this is always the case.
202202

203203
## [**Adapting Attention Pooling**]
204204

0 commit comments

Comments
 (0)