Update cross lecture links to use the new website (#43)

fonsp · web-flow · commit 40a89fecf37d · 2025-03-27T11:45:00.000+01:00
diff --git a/lectures/B01 Machine Learning Overview.jl b/lectures/B01 Machine Learning Overview.jl
@@ -213,7 +213,7 @@ Given the state of the world (obtained from sensory data), the agent must *learn
 
 In contrast to supervised and unsupervised learning, an agent is able to affect its data set by making actions, e.g., a robot can change its input video data stream by turning the head of its camera. 
 
-In this course, we focus on the active inference approach to trial design, see the [Intelligent Agent lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Intelligent-Agents-and-Active-Inference.ipynb) for details. 
+In this course, we focus on the active inference approach to trial design, see the [Intelligent Agent lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B12%20Intelligent%20Agents%20and%20Active%20Inference.html) for details. 
 
 """
 
diff --git a/lectures/B02 Probability Theory Review.jl b/lectures/B02 Probability Theory Review.jl
@@ -1028,7 +1028,7 @@ For proof, see [https://en.wikipedia.org/wiki/Product_distribution](https://en.w
 md"""
 Generally, this integral does not lead to an analytical expression for ``p_z(z)``. 
 
-For example, [the product of two independent variables that are both Gaussian-distributed does not lead to a Gaussian distribution](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Gaussian-Distribution.ipynb#product-of-gaussians).
+For example, [the product of two independent variables that are both Gaussian-distributed does not lead to a Gaussian distribution](https://biaslab.github.io/BMLIP-colorized/lectures/B05%20The%20Gaussian%20Distribution.html#product-of-gaussians).
 
   * Exception: the distribution of the product of two variables that both have [log-normal distributions](https://en.wikipedia.org/wiki/Log-normal_distribution) is again a lognormal distribution. (If ``X`` has a normal distribution, then ``Y=\exp(X)`` has a log-normal distribution.)
 
diff --git a/lectures/B03 Bayesian Machine Learning.jl b/lectures/B03 Bayesian Machine Learning.jl
@@ -465,7 +465,7 @@ This latter point accentuates that the common practice in machine learning to di
 md"""
 ## Bayesian Machine Learning and the Scientific Method Revisited
 
-The Bayesian design process provides a unified framework for the Scientific Inquiry method. We can now add equations to the design loop. (Trial design to be discussed in [Intelligent Agent lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Intelligent-Agents-and-Active-Inference.ipynb).) 
+The Bayesian design process provides a unified framework for the Scientific Inquiry method. We can now add equations to the design loop. (Trial design to be discussed in [Intelligent Agent lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B12%20Intelligent%20Agents%20and%20Active%20Inference.html).) 
 
 ![](https://github.com/bertdv/BMLIP/blob/2024_pdfs/lessons/notebooks/./figures/scientific-inquiry-loop-w-BML-eqs.png?raw=true)
 
@@ -891,7 +891,7 @@ end
 
 # ╔═╡ 6a2b1f5a-d294-11ef-25d0-e996c07958b9
 md"""
-(If the GIF animation is not rendered, you can try to [view it here](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/Bayesian-Machine-Learning.ipynb)).
+(If the GIF animation is not rendered, you can try to [view it here](https://biaslab.github.io/BMLIP-colorized/lectures/B03%20Bayesian%20Machine%20Learning.html)).
 
 """
 
@@ -927,8 +927,6 @@ end
 
 # ╔═╡ 6a2b9676-d294-11ef-241a-89ff7aa676f9
 md"""
-(If the GIF animation is not rendered, you can try to [view it here](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/Bayesian-Machine-Learning.ipynb)).
-
 Over time, the relative evidence of model ``m_1`` converges to 0. Can you explain this behavior?
 
 """
diff --git a/lectures/B04 Factor Graphs.jl b/lectures/B04 Factor Graphs.jl
@@ -630,7 +630,7 @@ Visually, the modularity of conditional independencies in the model are displaye
 
 Computationally, message passing-based inference uses the Distributive Law to avoid any unnecessary computations.  
 
-What is the relevance of this lesson? RxInfer is not yet a finished project. Still, my prediction is that in 5-10 years, this lesson on Factor Graphs will be the final lecture of part-A of this class, aimed at engineers who need to develop machine learning applications. In principle you have all the tools now to work out the 4-step machine learning recipe (1. model specification, 2. parameter learning, 3. model evaluation, 4. application) that was proposed in the [Bayesian machine learning lesson](https://nbviewer.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Bayesian-Machine-Learning.ipynb#Bayesian-design). You can propose any model and execute the (learning, evaluation, and application) stages by executing the corresponding inference task automatically in RxInfer. 
+What is the relevance of this lesson? RxInfer is not yet a finished project. Still, my prediction is that in 5-10 years, this lesson on Factor Graphs will be the final lecture of part-A of this class, aimed at engineers who need to develop machine learning applications. In principle you have all the tools now to work out the 4-step machine learning recipe (1. model specification, 2. parameter learning, 3. model evaluation, 4. application) that was proposed in the [Bayesian machine learning lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B03%20Bayesian%20Machine%20Learning.html#Bayesian-design). You can propose any model and execute the (learning, evaluation, and application) stages by executing the corresponding inference task automatically in RxInfer. 
 
 Part-B of this class would be about on advanced methods on how to improve automated inference by RxInfer or a similar probabilistic programming package. The Bayesian approach fully supports separating model specification from the inference task. 
 
diff --git a/lectures/B05 The Gaussian Distribution.jl b/lectures/B05 The Gaussian Distribution.jl
@@ -97,7 +97,7 @@ p(x | \mu, \sigma^2) =  \frac{1}{\sqrt{2\pi\sigma^2 }} \,\exp\left\{-\frac{(x-\m
 
 # ╔═╡ b9a50d0c-d294-11ef-0e60-2386cf289478
 md"""
-Alternatively, the $(HTML("<span id='natural-parameterization'>*canonical* (a.k.a. *natural*  or *information* ) parameterization</span>")) of the Gaussian distribution is given by
+Alternatively, the $(HTML("<span id='natural-parameterization'></span>"))*canonical* (a.k.a. *natural*  or *information* ) parameterization of the Gaussian distribution is given by
 
 ```math
 \begin{equation*}
@@ -165,7 +165,7 @@ A **linear transformation** ``z=Ax+b`` of a Gaussian variable ``x \sim \mathcal{
 p(z) = \mathcal{N} \left(z \,|\, A\mu_x+b, A\Sigma_x A^T \right) \tag{SRG-4a}
 ```
 
-In fact, after a linear transformation ``z=Ax+b``, no matter how ``x`` is distributed, the mean and variance of ``z`` are always given by ``\mu_z = A\mu_x + b``  and ``\Sigma_z = A\Sigma_x A^T``, respectively (see   [probability theory review lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Probability-Theory-Review.ipynb#linear-transformation)). In case ``x`` is not Gaussian, higher order moments may be needed to specify the distribution for ``z``. 
+In fact, after a linear transformation ``z=Ax+b``, no matter how ``x`` is distributed, the mean and variance of ``z`` are always given by ``\mu_z = A\mu_x + b``  and ``\Sigma_z = A\Sigma_x A^T``, respectively (see   [probability theory review lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B02%20Probability%20Theory%20Review.html#linear-transformation)). In case ``x`` is not Gaussian, higher order moments may be needed to specify the distribution for ``z``. 
 
 """
 
diff --git a/lectures/B06 The Multinomial Distribution.jl b/lectures/B06 The Multinomial Distribution.jl
@@ -126,7 +126,7 @@ This distribution depends on the observations **only** through the quantities ``
 
 # ╔═╡ d8439866-d294-11ef-230b-dfde21aedfbf
 md"""
-We need a prior for the parameters ``\mu = (\mu_1,\mu_2,\ldots,\mu_K)``. In the [binary coin toss example](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Bayesian-Machine-Learning.ipynb#beta-prior), 
+We need a prior for the parameters ``\mu = (\mu_1,\mu_2,\ldots,\mu_K)``. In the [binary coin toss example](https://biaslab.github.io/BMLIP-colorized/lectures/B03%20Bayesian%20Machine%20Learning.html#beta-prior), 
 
 we used a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) that was conjugate with the binomial and forced us to choose prior pseudo-counts. 
 
diff --git a/lectures/B07 Regression.jl b/lectures/B07 Regression.jl
@@ -152,7 +152,7 @@ p(w|D) &\propto p(D|w)\cdot p(w) \\
 \end{align*}
 ```
 
-with natural parameters (see the [natural parameterization of Gaussian](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Gaussian-Distribution.ipynb#natural-parameterization)):
+with natural parameters (see the [natural parameterization of Gaussian](https://biaslab.github.io/BMLIP-colorized/lectures/B05%20The%20Gaussian%20Distribution.html#natural-parameterization)):
 
 ```math
 \begin{align*}
diff --git a/lectures/B08 Generative Classification.jl b/lectures/B08 Generative Classification.jl
@@ -167,7 +167,7 @@ Hence, using the one-hot coding formulation for ``y_{nk}``, the generative model
 md"""
 We will refer to this model as the **Gaussian-Categorical Model** ($(HTML("<span id='GCM'>GCM</span>"))). 
 
-  * N.B. In the literature, this model (with possibly unequal ``\Sigma_k`` across classes) is often called the Gaussian Discriminant Analysis  model and the special case with equal covariance matrices ``\Sigma_k=\Sigma`` is also called Linear Discriminant Analysis. We think these names are a bit unfortunate as it may lead to confusion with the [discriminative method for classification](https://nbviewer.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Discriminative-Classification.ipynb).
+  * N.B. In the literature, this model (with possibly unequal ``\Sigma_k`` across classes) is often called the Gaussian Discriminant Analysis  model and the special case with equal covariance matrices ``\Sigma_k=\Sigma`` is also called Linear Discriminant Analysis. We think these names are a bit unfortunate as it may lead to confusion with the [discriminative method for classification](https://biaslab.github.io/BMLIP-colorized/lectures/B09%20Discriminative%20Classification.html).
 
 """
 
@@ -219,8 +219,8 @@ Recall (from the previous slide) the log-likelihood (LLH)
 md"""
 Maximization of the LLH for the GDA model breaks down into
 
-  * **Gaussian density estimation** for parameters ``\mu_k, \Sigma``, since the first term contains exactly the log-likelihood for MVG density estimation. We've already done this, see the [Gaussian distribution lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Gaussian-Distribution.ipynb#ML-for-Gaussian).
-  * **Multinomial density estimation** for class priors ``\pi_k``, since the second term holds exactly the log-likelihood for multinomial density estimation, see the [Multinomial distribution lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Multinomial-Distribution.ipynb#ML-for-multinomial).
+  * **Gaussian density estimation** for parameters ``\mu_k, \Sigma``, since the first term contains exactly the log-likelihood for MVG density estimation. We've already done this, see the [Gaussian distribution lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B05%20The%20Gaussian%20Distribution.html#ML-for-Gaussian).
+  * **Multinomial density estimation** for class priors ``\pi_k``, since the second term holds exactly the log-likelihood for multinomial density estimation, see the [Multinomial distribution lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B06%20The%20Multinomial%20Distribution.html#ML-for-multinomial).
 
 """
 
@@ -452,7 +452,7 @@ The following answer was provided:
 > 1. Bayesian evidence for model performance assessment. This means you can use the whole data set for training without an ad-hoc split into testing and training data sets.
 
 
-> 2. Uncertainty about parameters in the model is a measure that allows you to do *active learning*, ie, choose data that is most informative (see also the [lesson on intelligent agents](https://nbviewer.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Intelligent-Agents-and-Active-Inference.ipynb)). This will allow you to train on small data sets, whereas the deterministic DNNs generally require much larger data sets.
+> 2. Uncertainty about parameters in the model is a measure that allows you to do *active learning*, ie, choose data that is most informative (see also the [lesson on intelligent agents](https://biaslab.github.io/BMLIP-colorized/lectures/B12%20Intelligent%20Agents%20and%20Active%20Inference.html)). This will allow you to train on small data sets, whereas the deterministic DNNs generally require much larger data sets.
 
 
 > 3. Prediction with uncertainty/confidence bounds.
diff --git a/lectures/B09 Discriminative Classification.jl b/lectures/B09 Discriminative Classification.jl
@@ -118,7 +118,7 @@ What model should we use for the posterior distribution ``p(y_n \in \mathcal{C}_
 md"""
 #### Likelihood
 
-We will take inspiration from the [generative classification](https://nbviewer.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Generative-Classification.ipynb#softmax) approach, where we derived the class posterior 
+We will take inspiration from the [generative classification](https://biaslab.github.io/BMLIP-colorized/lectures/B08%20Generative%20Classification.html#softmax) approach, where we derived the class posterior 
 
 ```math
 p(y_{nk} = 1\,|\,x_n,\beta_k,\gamma_k) = \sigma(\beta_k^T x_n + \gamma_k)
@@ -544,7 +544,7 @@ Computing the gradient ``\nabla_{\theta_k} \mathrm{L}(\theta)`` leads to (for [p
 
 # ╔═╡ 25f386e4-d294-11ef-2cec-f56f4a6feb19
 md"""
-Compare this to the [gradient for *linear* regression](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Regression.ipynb#regression-gradient):
+Compare this to the [gradient for *linear* regression](https://biaslab.github.io/BMLIP-colorized/lectures/B07%20Regression.html#regression-gradient):
 
 ```math
 \nabla_\theta \mathrm{L}(\theta) =  \sum_n \left(y_n - \theta^T x_n \right)  x_n
@@ -570,7 +570,7 @@ The parameter vector ``\theta`` for logistic regression can be estimated through
 \hat{\theta}^{(i+1)} =  \hat{\theta}^{(i)} + \eta \cdot \left. \nabla_\theta   \mathrm{L}(\theta)  \right|_{\theta = \hat{\theta}^{(i)}}
 ```
 
-Note that, while in the Bayesian approach we get to update ``\theta`` with [**Kalman-gain-weighted** prediction errors](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/The-Gaussian-Distribution.ipynb#precision-weighted-update) (which is optimal), in the maximum likelihood approach, we weigh the prediction errors with **input** values (which is less precise).
+Note that, while in the Bayesian approach we get to update ``\theta`` with [**Kalman-gain-weighted** prediction errors](https://biaslab.github.io/BMLIP-colorized/lectures/B05%20The%20Gaussian%20Distribution.html#precision-weighted-update) (which is optimal), in the maximum likelihood approach, we weigh the prediction errors with **input** values (which is less precise).
 
 """
 
@@ -580,7 +580,7 @@ md"""
 
 Let us perform ML estimation of ``w`` on the data set from the introduction. To allow an offset in the discrimination boundary, we add a constant 1 to the feature vector ``x``. We only have to specify the (negative) log-likelihood and the gradient w.r.t. ``w``. Then, we use an off-the-shelf optimisation library to minimize the negative log-likelihood.
 
-We plot the resulting maximum likelihood discrimination boundary. For comparison we also plot the ML discrimination boundary obtained from the [code example in the generative Gaussian classifier lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Generative-Classification.ipynb#code-generative-classification-example).
+We plot the resulting maximum likelihood discrimination boundary. For comparison we also plot the ML discrimination boundary obtained from the [code example in the generative Gaussian classifier lesson](https://biaslab.github.io/BMLIP-colorized/lectures/B08%20Generative%20Classification.html#code-generative-classification-example).
 
 """
 
diff --git a/lectures/B10 Latent Variable Models and VB.jl b/lectures/B10 Latent Variable Models and VB.jl
diff --git a/lectures/B12 Intelligent Agents and Active Inference.jl b/lectures/B12 Intelligent Agents and Active Inference.jl