|
| 1 | +--- |
| 2 | +# 18.337 Homework 2: HPC and Sensitivity Analysis and Automatic Differentiation |
| 3 | + |
| 4 | +## Overview |
| 5 | + |
| 6 | +This is a getting started with HPC. |
| 7 | + |
| 8 | +1. Do Operation Counts tell the whole story in the real world? Count the operations for both multiplying and |
| 9 | +adding two n x n matrices. Express the leading term in the answer as C * n^e. What is C and what is e? |
| 10 | + |
| 11 | +Use `@belapsed` from `Benchmarktools.jl` to get the performance of both in function |
| 12 | +of the input matrix size. Compare the following four operations for small and large |
| 13 | +matrices, in powers of 2, from 2 to at least 216 (and by preference much larger until |
| 14 | +you run out of patience or memory). |
| 15 | + |
| 16 | +```julia |
| 17 | +# define n = |
| 18 | +eltype = Float32 #specify the data type (the default is Float64) |
| 19 | +A = randn(eltype,n,n) |
| 20 | +B = randn(eltype,n,n) |
| 21 | +C = zeros(eltype,n,n) # (or C = similar(A) also creates a ) |
| 22 | +mul!(C,A,B) #1. benchmark matmul for different n |
| 23 | +C = A * B #2. compare with the line above, what is the difference |
| 24 | +#in speed and why? |
| 25 | +C .= A .+ B #3. benchmark matadd for different n |
| 26 | +C = A + B #4. compare with the line above, what is the difference |
| 27 | +#in speed and why? |
| 28 | +``` |
| 29 | + |
| 30 | +Remember in Benchmarktools to use $ (Julia macro interpolation) for any data |
| 31 | +inputs. |
| 32 | +Please submit your code, and a table of computation times with n in the rows, and |
| 33 | +the four operations in the columns. Label your units (seconds or mseconds). |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Overview |
| 38 | + |
| 39 | +In this part of the homework you will work through the core machinery underlying differentiable simulation: solving an ODE, computing parameter sensitivities analytically, building a simple forward-mode AD system from scratch, and using all of these pieces together to fit a model to data via gradient descent. |
| 40 | + |
| 41 | +You will need the following packages: |
| 42 | + |
| 43 | +```julia |
| 44 | +using OrdinaryDiffEq |
| 45 | +using Plots |
| 46 | +``` |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +## Part 1: Solving the Lotka-Volterra Equations |
| 51 | + |
| 52 | +The Lotka-Volterra (predator-prey) equations are: |
| 53 | + |
| 54 | +$$\frac{du_1}{dt} = \alpha u_1 - \beta u_1 u_2$$ |
| 55 | +$$\frac{du_2}{dt} = -\delta u_2 + \gamma u_1 u_2$$ |
| 56 | + |
| 57 | +where $u_1$ is the prey population, $u_2$ is the predator population, and $p = (\alpha, \beta, \delta, \gamma)$ are the model parameters. |
| 58 | + |
| 59 | +**Tasks:** |
| 60 | + |
| 61 | +1. Implement the Lotka-Volterra ODE as a function compatible with the SciML/DiffEq interface, i.e. with signature `f(du, u, p, t)`. |
| 62 | + |
| 63 | +2. Solve the system on the time span $t \in [0, 10]$ with initial condition $u_0 = [1.0, 1.0]$ and parameters $p = [1.5, 1.0, 3.0, 1.0]$ using the `Tsit5()` solver from `OrdinaryDiffEq.jl`. |
| 64 | + |
| 65 | +3. Plot both $u_1(t)$ and $u_2(t)$ on the same axes and describe qualitatively what you observe about the dynamics. |
| 66 | + |
| 67 | +**Hint:** An `ODEProblem` is constructed as: |
| 68 | + |
| 69 | +```julia |
| 70 | +prob = ODEProblem(f, u0, tspan, p) |
| 71 | +sol = solve(prob, Tsit5()) |
| 72 | +``` |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +## Part 2: Forward Sensitivity Equations |
| 77 | + |
| 78 | +Rather than using an automatic tool, here you will derive and implement the forward sensitivity equations by hand. The forward sensitivity equations track how the solution $u(t)$ changes with respect to each parameter $p_i$ by augmenting the ODE with additional equations for $s_i(t) = \partial u / \partial p_i$. |
| 79 | + |
| 80 | +Given the ODE $\dot{u} = f(u, p, t)$, differentiating with respect to $p_i$ gives: |
| 81 | + |
| 82 | +$$\frac{d s_i}{dt} = \frac{\partial f}{\partial u} s_i + \frac{\partial f}{\partial p_i}, \qquad s_i(0) = \frac{\partial u_0}{\partial p_i}$$ |
| 83 | + |
| 84 | +where $\partial f / \partial u$ is the Jacobian of $f$ with respect to the state $u$. |
| 85 | + |
| 86 | +**Tasks:** |
| 87 | + |
| 88 | +1. Derive by hand all four Jacobian blocks needed for the Lotka-Volterra system: |
| 89 | + - $\partial f / \partial u$ (a $2 \times 2$ matrix) |
| 90 | + - $\partial f / \partial p_i$ for each of the four parameters (each a vector of length 2) |
| 91 | + |
| 92 | +2. Implement an augmented ODE `f_aug(dz, z, p, t)` where the state $z$ concatenates $u$ and the four sensitivity vectors $s_1, s_2, s_3, s_4$ (so $z \in \mathbb{R}^{10}$). The initial condition for each $s_i(0)$ is zero (assuming $u_0$ does not depend on $p$). |
| 93 | + |
| 94 | +3. Solve the augmented system with the same solver and time span as Part 1. |
| 95 | + |
| 96 | +4. Extract and plot the sensitivity of $u_1$ and $u_2$ with respect to each parameter over time. Comment on which parameters the solution is most sensitive to and when. |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## Part 3: Forward-Mode Automatic Differentiation via Dual Numbers |
| 101 | + |
| 102 | +In the lecture notes on [Forward-Mode AD via High-Dimensional Algebras](https://book.sciml.ai/notes/08-Forward-Mode_Automatic_Differentiation_(AD)_via_High_Dimensional_Algebras/) you saw that forward-mode AD can be implemented by defining a new number type that carries both a primal value and a derivative (the "dual" part), and then overloading arithmetic so that the chain rule propagates automatically. |
| 103 | + |
| 104 | +**Tasks:** |
| 105 | + |
| 106 | +1. Define a `Dual` number struct that holds a value `v` and a derivative `d`: |
| 107 | + |
| 108 | + ```julia |
| 109 | + struct Dual{T} <: Number |
| 110 | + v::T |
| 111 | + d::T |
| 112 | + end |
| 113 | + ``` |
| 114 | + |
| 115 | +2. Overload the following operations for `Dual` numbers following the rules derived from the algebra of dual numbers (i.e. $\epsilon^2 = 0$): |
| 116 | + - `+`, `-`, `*`, `/` |
| 117 | + - `^` (integer powers are sufficient, but feel free to implement the general case) |
| 118 | + - `exp`, `log`, `sin`, `cos` |
| 119 | + - Comparison operators and `isless` (needed for the ODE solver internals) |
| 120 | + - Promotion and conversion rules so that `Dual` interacts correctly with `Float64` and other numeric types |
| 121 | + |
| 122 | +3. Write a generic function `derivative(f, x)` that uses your `Dual` type to compute $f'(x)$ by evaluating `f(Dual(x, one(x)))` and extracting the dual part. |
| 123 | + |
| 124 | +4. Verify your implementation on $f(x) = x^3 \sin(x)$ and $f(x) = e^{x^2}$ and check against analytically known derivatives. |
| 125 | + |
| 126 | +5. Now demonstrate that forward-mode AD with your `Dual` type reproduces the sensitivity results from Part 2. Specifically: for a fixed time point $t=10$, show that propagating `Dual` numbers through the ODE solve (by passing `Dual`-valued parameters into your Part 1 ODE function and solver) recovers $\partial u(t^*) / \partial p_i$ to "within floating point error", matching the augmented ODE result from Part 2. |
| 127 | + |
| 128 | +**Note:** Getting your `Dual` type to work through the ODE solver requires that all arithmetic inside the solver be generic. The Tsit5 implementation in `OrdinaryDiffEq.jl` is written generically and will work if your number type satisfies the necessary interface. Beyond the operations listed above, you will need to overload several additional functions: |
| 129 | + |
| 130 | +- **`Base.muladd`**: The ODE solver uses the `@muladd` macro, which rewrites `a*b + c` into `muladd(a, b, c)` throughout the Runge-Kutta step computation. You will need `muladd` methods for all combinations of `Dual` and `Number` arguments (7 signatures total). Implementing it as `a * b + c` is sufficient. |
| 131 | +- **`Base.sqrt`**, **`Base.abs2`**, **`Base.inv`**, **`Base.log10`**: Used in error norm computation, step size control, and initial time step estimation. Implement these using the standard chain rule. |
| 132 | +- **`Base.:^(::Dual, ::Dual)`** and **`Base.:^(::Dual, ::Float64)`**: The initial step size computation evaluates `10^(-(2 + log10(...))/order)`, which promotes integers to `Dual` via your promotion rules and requires a general power rule. Use $\frac{d}{dx}[f^g] = f^g(g' \ln f + g f'/f)$. |
| 133 | +- **`Base.zero`**, **`Base.one`**, **`Base.abs`**: Construct a `Dual` with zero derivative for `zero`/`one`; for `abs`, use `sign` to flip the derivative. |
| 134 | +- **`Base.isnan`**, **`Base.isinf`**, **`Base.isfinite`**, **`Base.iszero`**: Predicates for safety checks; these should operate on the primal value only and return `Bool`. |
| 135 | + |
| 136 | +The following overloads are solver-internal plumbing (type conversions, step size bookkeeping, etc.) that are not mathematically interesting but are required for the solver to run. All of these follow from the idea that a real number is a dual number with 0 dual part, and that control flow should operate on the dual-valued function exactly the same as the real-valued function (i.e. the primal is preserved for non-differentiable operations). |
| 137 | + |
| 138 | +```julia |
| 139 | +Dual{T}(x::Dual{T}) where {T} = x |
| 140 | +Dual{T}(x::Dual) where {T} = Dual(convert(T, x.v), convert(T, x.d)) |
| 141 | +Dual{T}(x::Number) where {T} = Dual(convert(T, x), zero(T)) |
| 142 | +Base.convert(::Type{Dual{T}}, x::Dual{T}) where {T} = x |
| 143 | +Base.convert(::Type{Dual{T}}, x::Dual) where {T} = Dual(convert(T, x.v), convert(T, x.d)) |
| 144 | +Base.convert(::Type{Dual{T}}, x::Number) where {T} = Dual(convert(T, x), zero(T)) |
| 145 | +Base.promote_rule(::Type{Dual{T}}, ::Type{S}) where {T, S<:Number} = Dual{promote_type(T, S)} |
| 146 | +Base.convert(::Type{T}, x::Dual) where {T<:AbstractFloat} = convert(T, x.v) |
| 147 | +(::Type{T})(x::Dual) where {T<:AbstractFloat} = T(x.v) |
| 148 | +Base.oneunit(::Type{Dual{T}}) where {T} = Dual(oneunit(T), zero(T)) |
| 149 | +Base.oneunit(a::Dual) = oneunit(typeof(a)) |
| 150 | +Base.eps(::Type{Dual{T}}) where {T} = eps(T) |
| 151 | +Base.eps(a::Dual) = eps(a.v) |
| 152 | +Base.floatmin(::Type{Dual{T}}) where {T} = floatmin(T) |
| 153 | +Base.float(::Type{Dual{T}}) where {T} = Dual{float(T)} |
| 154 | +Base.float(x::Dual{T}) where {T} = Dual(float(x.v), float(x.d)) |
| 155 | +Base.real(a::Dual) = a |
| 156 | +Base.nextfloat(a::Dual) = Dual(nextfloat(a.v), a.d) |
| 157 | +Base.isless(a::Dual, b::Real) = isless(a.v, b) |
| 158 | +Base.isless(a::Real, b::Dual) = isless(a, b.v) |
| 159 | +Base.rtoldefault(::Type{Dual{T}}) where {T} = Base.rtoldefault(T) |
| 160 | +``` |
| 161 | + |
| 162 | +--- |
| 163 | + |
| 164 | +## Part 4: Parameter Estimation via Gradient Descent |
| 165 | + |
| 166 | +Use the solution from Part 1 as your "data". That is, solve the Lotka-Volterra system with the true parameters $p^* = [1.5, 1.0, 3.0, 1.0]$ and save the solution at the same discrete time points your solver outputs. This gives you a reference trajectory with no noise, so a perfect optimizer should be able to recover $p^*$ exactly. |
| 167 | + |
| 168 | +**Tasks:** |
| 169 | + |
| 170 | +1. Define a mean-squared-error loss function: |
| 171 | + |
| 172 | +$$L(p) = \frac{1}{N} \sum_{i=1}^{N} \|u(t_i; p) - u(t_i; p^*)\|^2$$ |
| 173 | + |
| 174 | +where $u(t_i; p^*)$ are the reference trajectory values from Part 1 and $u(t_i; p)$ is the ODE solution at time $t_i$ with the current parameter guess $p$. |
| 175 | + |
| 176 | +2. Compute the gradient $\nabla_p L$ using the forward sensitivity equations from Part 2. For each observation time $t_i$ and each parameter $p_j$, use the augmented ODE solution to obtain $\partial u(t_i; p) / \partial p_j$, then apply the chain rule: |
| 177 | + |
| 178 | +$$\frac{\partial L}{\partial p_j} = \frac{2}{N} \sum_{i=1}^{N} \left(u(t_i; p) - u(t_i; p^*)\right)^T \frac{\partial u(t_i; p)}{\partial p_j}$$ |
| 179 | + |
| 180 | +3. Starting from the perturbed initial guess $p_0 = [1.2, 0.8, 2.8, 0.8]$, implement a gradient descent loop with a fixed learning rate (a learning rate of $\eta = 0.0002$ works; you may need to experiment). Run for at least 1000 iterations and print the loss periodically (e.g. every 50 iterations). |
| 181 | + |
| 182 | +4. Plot the loss curve over iterations. Plot the final estimated trajectory alongside the reference trajectory and confirm they visually agree. Report the final estimated parameters and verify they are close to $p^*$. |
| 183 | + |
| 184 | +5. Since the data is exact (no noise), the minimum of $L$ is zero and is achieved at $p = p^*$. Does your gradient descent reach a loss near machine epsilon? If not, explain why not and what you would need to do differently to get closer. |
| 185 | + |
| 186 | +**Note:** You are not required to use a sophisticated optimizer. Plain gradient descent is expected here. The point is to see how the sensitivity equations provide exact (up to ODE solver tolerance) gradients that enable optimization, and that when the data is generated by the model itself, the true parameters are recoverable in principle. |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +## Submission |
| 191 | + |
| 192 | +Submit a single Julia script (Pluto notebook), or Jupyter notebook (`.ipynb`) containing all four parts. Each part should include: |
| 193 | + |
| 194 | +- Your code |
| 195 | +- Plots where requested |
| 196 | +- Written answers to any qualitative questions (as comments or markdown cells) |
| 197 | + |
| 198 | +Make sure your notebook runs end-to-end without errors before submitting. |
0 commit comments