You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/B01 Machine Learning Overview.jl
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -213,7 +213,7 @@ Given the state of the world (obtained from sensory data), the agent must *learn
213
213
214
214
In contrast to supervised and unsupervised learning, an agent is able to affect its data set by making actions, e.g., a robot can change its input video data stream by turning the head of its camera.
215
215
216
-
In this course, we focus on the active inference approach to trial design, see the [Intelligent Agent lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Intelligent-Agents-and-Active-Inference.ipynb) for details.
216
+
In this course, we focus on the active inference approach to trial design, see the [Intelligent Agent lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B12%20Intelligent%20Agents%20and%20Active%20Inference.html) for details.
Copy file name to clipboardExpand all lines: lectures/B02 Probability Theory Review.jl
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1028,7 +1028,7 @@ For proof, see [https://en.wikipedia.org/wiki/Product_distribution](https://en.w
1028
1028
md"""
1029
1029
Generally, this integral does not lead to an analytical expression for ``p_z(z)``.
1030
1030
1031
-
For example, [the product of two independent variables that are both Gaussian-distributed does not lead to a Gaussian distribution](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Gaussian-Distribution.ipynb#product-of-gaussians).
1031
+
For example, [the product of two independent variables that are both Gaussian-distributed does not lead to a Gaussian distribution](https://biaslab.github.io/BMLIP-colorized/lectures/B05%20The%20Gaussian%20Distribution.html#product-of-gaussians).
1032
1032
1033
1033
* Exception: the distribution of the product of two variables that both have [log-normal distributions](https://en.wikipedia.org/wiki/Log-normal_distribution) is again a lognormal distribution. (If ``X`` has a normal distribution, then ``Y=\exp(X)`` has a log-normal distribution.)
Copy file name to clipboardExpand all lines: lectures/B03 Bayesian Machine Learning.jl
+2-4Lines changed: 2 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -465,7 +465,7 @@ This latter point accentuates that the common practice in machine learning to di
465
465
md"""
466
466
## Bayesian Machine Learning and the Scientific Method Revisited
467
467
468
-
The Bayesian design process provides a unified framework for the Scientific Inquiry method. We can now add equations to the design loop. (Trial design to be discussed in [Intelligent Agent lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Intelligent-Agents-and-Active-Inference.ipynb).)
468
+
The Bayesian design process provides a unified framework for the Scientific Inquiry method. We can now add equations to the design loop. (Trial design to be discussed in [Intelligent Agent lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B12%20Intelligent%20Agents%20and%20Active%20Inference.html).)
(If the GIF animation is not rendered, you can try to [view it here](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/Bayesian-Machine-Learning.ipynb)).
894
+
(If the GIF animation is not rendered, you can try to [view it here](https://biaslab.github.io/BMLIP-colorized/lectures/B03%20Bayesian%20Machine%20Learning.html)).
895
895
896
896
"""
897
897
@@ -927,8 +927,6 @@ end
927
927
928
928
# ╔═╡ 6a2b9676-d294-11ef-241a-89ff7aa676f9
929
929
md"""
930
-
(If the GIF animation is not rendered, you can try to [view it here](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/Bayesian-Machine-Learning.ipynb)).
931
-
932
930
Over time, the relative evidence of model ``m_1`` converges to 0. Can you explain this behavior?
Copy file name to clipboardExpand all lines: lectures/B04 Factor Graphs.jl
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -630,7 +630,7 @@ Visually, the modularity of conditional independencies in the model are displaye
630
630
631
631
Computationally, message passing-based inference uses the Distributive Law to avoid any unnecessary computations.
632
632
633
-
What is the relevance of this lesson? RxInfer is not yet a finished project. Still, my prediction is that in 5-10 years, this lesson on Factor Graphs will be the final lecture of part-A of this class, aimed at engineers who need to develop machine learning applications. In principle you have all the tools now to work out the 4-step machine learning recipe (1. model specification, 2. parameter learning, 3. model evaluation, 4. application) that was proposed in the [Bayesian machine learning lesson](https://nbviewer.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Bayesian-Machine-Learning.ipynb#Bayesian-design). You can propose any model and execute the (learning, evaluation, and application) stages by executing the corresponding inference task automatically in RxInfer.
633
+
What is the relevance of this lesson? RxInfer is not yet a finished project. Still, my prediction is that in 5-10 years, this lesson on Factor Graphs will be the final lecture of part-A of this class, aimed at engineers who need to develop machine learning applications. In principle you have all the tools now to work out the 4-step machine learning recipe (1. model specification, 2. parameter learning, 3. model evaluation, 4. application) that was proposed in the [Bayesian machine learning lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B03%20Bayesian%20Machine%20Learning.html#Bayesian-design). You can propose any model and execute the (learning, evaluation, and application) stages by executing the corresponding inference task automatically in RxInfer.
634
634
635
635
Part-B of this class would be about on advanced methods on how to improve automated inference by RxInfer or a similar probabilistic programming package. The Bayesian approach fully supports separating model specification from the inference task.
Alternatively, the $(HTML("<span id='natural-parameterization'>*canonical* (a.k.a. *natural* or *information* ) parameterization</span>")) of the Gaussian distribution is given by
100
+
Alternatively, the $(HTML("<span id='natural-parameterization'></span>"))*canonical* (a.k.a. *natural* or *information* ) parameterization of the Gaussian distribution is given by
101
101
102
102
```math
103
103
\begin{equation*}
@@ -165,7 +165,7 @@ A **linear transformation** ``z=Ax+b`` of a Gaussian variable ``x \sim \mathcal{
In fact, after a linear transformation ``z=Ax+b``, no matter how ``x`` is distributed, the mean and variance of ``z`` are always given by ``\mu_z = A\mu_x + b`` and ``\Sigma_z = A\Sigma_x A^T``, respectively (see [probability theory review lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Probability-Theory-Review.ipynb#linear-transformation)). In case ``x`` is not Gaussian, higher order moments may be needed to specify the distribution for ``z``.
168
+
In fact, after a linear transformation ``z=Ax+b``, no matter how ``x`` is distributed, the mean and variance of ``z`` are always given by ``\mu_z = A\mu_x + b`` and ``\Sigma_z = A\Sigma_x A^T``, respectively (see [probability theory review lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B02%20Probability%20Theory%20Review.html#linear-transformation)). In case ``x`` is not Gaussian, higher order moments may be needed to specify the distribution for ``z``.
Copy file name to clipboardExpand all lines: lectures/B06 The Multinomial Distribution.jl
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -126,7 +126,7 @@ This distribution depends on the observations **only** through the quantities ``
126
126
127
127
# ╔═╡ d8439866-d294-11ef-230b-dfde21aedfbf
128
128
md"""
129
-
We need a prior for the parameters ``\mu = (\mu_1,\mu_2,\ldots,\mu_K)``. In the [binary coin toss example](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Bayesian-Machine-Learning.ipynb#beta-prior),
129
+
We need a prior for the parameters ``\mu = (\mu_1,\mu_2,\ldots,\mu_K)``. In the [binary coin toss example](https://biaslab.github.io/BMLIP-colorized/lectures/B03%20Bayesian%20Machine%20Learning.html#beta-prior),
130
130
131
131
we used a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) that was conjugate with the binomial and forced us to choose prior pseudo-counts.
with natural parameters (see the [natural parameterization of Gaussian](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Gaussian-Distribution.ipynb#natural-parameterization)):
155
+
with natural parameters (see the [natural parameterization of Gaussian](https://biaslab.github.io/BMLIP-colorized/lectures/B05%20The%20Gaussian%20Distribution.html#natural-parameterization)):
Copy file name to clipboardExpand all lines: lectures/B08 Generative Classification.jl
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -167,7 +167,7 @@ Hence, using the one-hot coding formulation for ``y_{nk}``, the generative model
167
167
md"""
168
168
We will refer to this model as the **Gaussian-Categorical Model** ($(HTML("<span id='GCM'>GCM</span>"))).
169
169
170
-
* N.B. In the literature, this model (with possibly unequal ``\Sigma_k`` across classes) is often called the Gaussian Discriminant Analysis model and the special case with equal covariance matrices ``\Sigma_k=\Sigma`` is also called Linear Discriminant Analysis. We think these names are a bit unfortunate as it may lead to confusion with the [discriminative method for classification](https://nbviewer.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Discriminative-Classification.ipynb).
170
+
* N.B. In the literature, this model (with possibly unequal ``\Sigma_k`` across classes) is often called the Gaussian Discriminant Analysis model and the special case with equal covariance matrices ``\Sigma_k=\Sigma`` is also called Linear Discriminant Analysis. We think these names are a bit unfortunate as it may lead to confusion with the [discriminative method for classification](https://biaslab.github.io/BMLIP-colorized/lectures/B09%20Discriminative%20Classification.html).
171
171
172
172
"""
173
173
@@ -219,8 +219,8 @@ Recall (from the previous slide) the log-likelihood (LLH)
219
219
md"""
220
220
Maximization of the LLH for the GDA model breaks down into
221
221
222
-
* **Gaussian density estimation** for parameters ``\mu_k, \Sigma``, since the first term contains exactly the log-likelihood for MVG density estimation. We've already done this, see the [Gaussian distribution lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Gaussian-Distribution.ipynb#ML-for-Gaussian).
223
-
* **Multinomial density estimation** for class priors ``\pi_k``, since the second term holds exactly the log-likelihood for multinomial density estimation, see the [Multinomial distribution lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Multinomial-Distribution.ipynb#ML-for-multinomial).
222
+
* **Gaussian density estimation** for parameters ``\mu_k, \Sigma``, since the first term contains exactly the log-likelihood for MVG density estimation. We've already done this, see the [Gaussian distribution lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B05%20The%20Gaussian%20Distribution.html#ML-for-Gaussian).
223
+
* **Multinomial density estimation** for class priors ``\pi_k``, since the second term holds exactly the log-likelihood for multinomial density estimation, see the [Multinomial distribution lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B06%20The%20Multinomial%20Distribution.html#ML-for-multinomial).
224
224
225
225
"""
226
226
@@ -452,7 +452,7 @@ The following answer was provided:
452
452
> 1. Bayesian evidence for model performance assessment. This means you can use the whole data set for training without an ad-hoc split into testing and training data sets.
453
453
454
454
455
-
> 2. Uncertainty about parameters in the model is a measure that allows you to do *active learning*, ie, choose data that is most informative (see also the [lesson on intelligent agents](https://nbviewer.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Intelligent-Agents-and-Active-Inference.ipynb)). This will allow you to train on small data sets, whereas the deterministic DNNs generally require much larger data sets.
455
+
> 2. Uncertainty about parameters in the model is a measure that allows you to do *active learning*, ie, choose data that is most informative (see also the [lesson on intelligent agents](https://biaslab.github.io/BMLIP-colorized/lectures/B12%20Intelligent%20Agents%20and%20Active%20Inference.html)). This will allow you to train on small data sets, whereas the deterministic DNNs generally require much larger data sets.
456
456
457
457
458
458
> 3. Prediction with uncertainty/confidence bounds.
Copy file name to clipboardExpand all lines: lectures/B09 Discriminative Classification.jl
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -118,7 +118,7 @@ What model should we use for the posterior distribution ``p(y_n \in \mathcal{C}_
118
118
md"""
119
119
#### Likelihood
120
120
121
-
We will take inspiration from the [generative classification](https://nbviewer.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Generative-Classification.ipynb#softmax) approach, where we derived the class posterior
121
+
We will take inspiration from the [generative classification](https://biaslab.github.io/BMLIP-colorized/lectures/B08%20Generative%20Classification.html#softmax) approach, where we derived the class posterior
@@ -544,7 +544,7 @@ Computing the gradient ``\nabla_{\theta_k} \mathrm{L}(\theta)`` leads to (for [p
544
544
545
545
# ╔═╡ 25f386e4-d294-11ef-2cec-f56f4a6feb19
546
546
md"""
547
-
Compare this to the [gradient for *linear* regression](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Regression.ipynb#regression-gradient):
547
+
Compare this to the [gradient for *linear* regression](https://biaslab.github.io/BMLIP-colorized/lectures/B07%20Regression.html#regression-gradient):
Note that, while in the Bayesian approach we get to update ``\theta`` with [**Kalman-gain-weighted** prediction errors](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Gaussian-Distribution.ipynb#precision-weighted-update) (which is optimal), in the maximum likelihood approach, we weigh the prediction errors with **input** values (which is less precise).
573
+
Note that, while in the Bayesian approach we get to update ``\theta`` with [**Kalman-gain-weighted** prediction errors](https://biaslab.github.io/BMLIP-colorized/lectures/B05%20The%20Gaussian%20Distribution.html#precision-weighted-update) (which is optimal), in the maximum likelihood approach, we weigh the prediction errors with **input** values (which is less precise).
574
574
575
575
"""
576
576
@@ -580,7 +580,7 @@ md"""
580
580
581
581
Let us perform ML estimation of ``w`` on the data set from the introduction. To allow an offset in the discrimination boundary, we add a constant 1 to the feature vector ``x``. We only have to specify the (negative) log-likelihood and the gradient w.r.t. ``w``. Then, we use an off-the-shelf optimisation library to minimize the negative log-likelihood.
582
582
583
-
We plot the resulting maximum likelihood discrimination boundary. For comparison we also plot the ML discrimination boundary obtained from the [code example in the generative Gaussian classifier lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Generative-Classification.ipynb#code-generative-classification-example).
583
+
We plot the resulting maximum likelihood discrimination boundary. For comparison we also plot the ML discrimination boundary obtained from the [code example in the generative Gaussian classifier lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B08%20Generative%20Classification.html#code-generative-classification-example).
0 commit comments