Skip to content

Commit a95611f

Browse files
committed
Create 18.337 2026 hw2.md
1 parent ef4ff17 commit a95611f

File tree

1 file changed

+166
-0
lines changed

1 file changed

+166
-0
lines changed

18.337 2026 hw2.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# 18.337 Homework 2: Sensitivity Analysis and Automatic Differentiation
2+
3+
---
4+
5+
## Overview
6+
7+
In this homework you will work through the core machinery underlying differentiable simulation: solving an ODE, computing parameter sensitivities analytically, building a simple forward-mode AD system from scratch, and using all of these pieces together to fit a model to data via gradient descent.
8+
9+
You will need the following packages:
10+
11+
```julia
12+
using OrdinaryDiffEq
13+
using Plots
14+
```
15+
16+
---
17+
18+
## Part 1: Solving the Lotka-Volterra Equations
19+
20+
The Lotka-Volterra (predator-prey) equations are:
21+
22+
$$\frac{du_1}{dt} = \alpha u_1 - \beta u_1 u_2$$
23+
$$\frac{du_2}{dt} = -\delta u_2 + \gamma u_1 u_2$$
24+
25+
where $u_1$ is the prey population, $u_2$ is the predator population, and $p = (\alpha, \beta, \delta, \gamma)$ are the model parameters.
26+
27+
**Tasks:**
28+
29+
1. Implement the Lotka-Volterra ODE as a function compatible with the SciML/DiffEq interface, i.e. with signature `f(du, u, p, t)`.
30+
31+
2. Solve the system on the time span $t \in [0, 10]$ with initial condition $u_0 = [1.0, 1.0]$ and parameters $p = [1.5, 1.0, 3.0, 1.0]$ using the `Tsit5()` solver from `OrdinaryDiffEq.jl`.
32+
33+
3. Plot both $u_1(t)$ and $u_2(t)$ on the same axes and describe qualitatively what you observe about the dynamics.
34+
35+
**Hint:** An `ODEProblem` is constructed as:
36+
37+
```julia
38+
prob = ODEProblem(f, u0, tspan, p)
39+
sol = solve(prob, Tsit5())
40+
```
41+
42+
---
43+
44+
## Part 2: Forward Sensitivity Equations
45+
46+
Rather than using an automatic tool, here you will derive and implement the forward sensitivity equations by hand. The forward sensitivity equations track how the solution $u(t)$ changes with respect to each parameter $p_i$ by augmenting the ODE with additional equations for $s_i(t) = \partial u / \partial p_i$.
47+
48+
Given the ODE $\dot{u} = f(u, p, t)$, differentiating with respect to $p_i$ gives:
49+
50+
$$\frac{d s_i}{dt} = \frac{\partial f}{\partial u} s_i + \frac{\partial f}{\partial p_i}, \qquad s_i(0) = \frac{\partial u_0}{\partial p_i}$$
51+
52+
where $\partial f / \partial u$ is the Jacobian of $f$ with respect to the state $u$.
53+
54+
**Tasks:**
55+
56+
1. Derive by hand all four Jacobian blocks needed for the Lotka-Volterra system:
57+
- $\partial f / \partial u$ (a $2 \times 2$ matrix)
58+
- $\partial f / \partial p_i$ for each of the four parameters (each a vector of length 2)
59+
60+
2. Implement an augmented ODE `f_aug(dz, z, p, t)` where the state $z$ concatenates $u$ and the four sensitivity vectors $s_1, s_2, s_3, s_4$ (so $z \in \mathbb{R}^{10}$). The initial condition for each $s_i(0)$ is zero (assuming $u_0$ does not depend on $p$).
61+
62+
3. Solve the augmented system with the same solver and time span as Part 1.
63+
64+
4. Extract and plot the sensitivity of $u_1$ and $u_2$ with respect to each parameter over time. Comment on which parameters the solution is most sensitive to and when.
65+
66+
---
67+
68+
## Part 3: Forward-Mode Automatic Differentiation via Dual Numbers
69+
70+
In the lecture notes on [Forward-Mode AD via High-Dimensional Algebras](https://book.sciml.ai/notes/08-Forward-Mode_Automatic_Differentiation_(AD)_via_High_Dimensional_Algebras/) you saw that forward-mode AD can be implemented by defining a new number type that carries both a primal value and a derivative (the "dual" part), and then overloading arithmetic so that the chain rule propagates automatically.
71+
72+
**Tasks:**
73+
74+
1. Define a `Dual` number struct that holds a value `v` and a derivative `d`:
75+
76+
```julia
77+
struct Dual{T} <: Number
78+
v::T
79+
d::T
80+
end
81+
```
82+
83+
2. Overload the following operations for `Dual` numbers following the rules derived from the algebra of dual numbers (i.e. $\epsilon^2 = 0$):
84+
- `+`, `-`, `*`, `/`
85+
- `^` (integer powers are sufficient, but feel free to implement the general case)
86+
- `exp`, `log`, `sin`, `cos`
87+
- Comparison operators and `isless` (needed for the ODE solver internals)
88+
- Promotion and conversion rules so that `Dual` interacts correctly with `Float64` and other numeric types
89+
90+
3. Write a generic function `derivative(f, x)` that uses your `Dual` type to compute $f'(x)$ by evaluating `f(Dual(x, one(x)))` and extracting the dual part.
91+
92+
4. Verify your implementation on $f(x) = x^3 \sin(x)$ and $f(x) = e^{x^2}$ and check against analytically known derivatives.
93+
94+
5. Now demonstrate that forward-mode AD with your `Dual` type reproduces the sensitivity results from Part 2. Specifically: for a fixed time point $t=10$, show that propagating `Dual` numbers through the ODE solve (by passing `Dual`-valued parameters into your Part 1 ODE function and solver) recovers $\partial u(t^*) / \partial p_i$ to "within floating point error", matching the augmented ODE result from Part 2.
95+
96+
**Note:** Getting your `Dual` type to work through the ODE solver requires that all arithmetic inside the solver be generic. The Tsit5 implementation in `OrdinaryDiffEq.jl` is written generically and will work if your number type satisfies the necessary interface. Beyond the operations listed above, you will need to overload several additional functions:
97+
98+
- **`Base.muladd`**: The ODE solver uses the `@muladd` macro, which rewrites `a*b + c` into `muladd(a, b, c)` throughout the Runge-Kutta step computation. You will need `muladd` methods for all combinations of `Dual` and `Number` arguments (7 signatures total). Implementing it as `a * b + c` is sufficient.
99+
- **`Base.sqrt`**, **`Base.abs2`**, **`Base.inv`**, **`Base.log10`**: Used in error norm computation, step size control, and initial time step estimation. Implement these using the standard chain rule.
100+
- **`Base.:^(::Dual, ::Dual)`** and **`Base.:^(::Dual, ::Float64)`**: The initial step size computation evaluates `10^(-(2 + log10(...))/order)`, which promotes integers to `Dual` via your promotion rules and requires a general power rule. Use $\frac{d}{dx}[f^g] = f^g(g' \ln f + g f'/f)$.
101+
- **`Base.zero`**, **`Base.one`**, **`Base.abs`**: Construct a `Dual` with zero derivative for `zero`/`one`; for `abs`, use `sign` to flip the derivative.
102+
- **`Base.isnan`**, **`Base.isinf`**, **`Base.isfinite`**, **`Base.iszero`**: Predicates for safety checks; these should operate on the primal value only and return `Bool`.
103+
104+
The following overloads are solver-internal plumbing (type conversions, step size bookkeeping, etc.) that are not mathematically interesting but are required for the solver to run. All of these follow from the idea that a real number is a dual number with 0 dual part, and that control flow should operate on the dual-valued function exactly the same as the real-valued function (i.e. the primal is preserved for non-differentiable operations).
105+
106+
```julia
107+
Dual{T}(x::Dual{T}) where {T} = x
108+
Dual{T}(x::Dual) where {T} = Dual(convert(T, x.v), convert(T, x.d))
109+
Dual{T}(x::Number) where {T} = Dual(convert(T, x), zero(T))
110+
Base.convert(::Type{Dual{T}}, x::Dual{T}) where {T} = x
111+
Base.convert(::Type{Dual{T}}, x::Dual) where {T} = Dual(convert(T, x.v), convert(T, x.d))
112+
Base.convert(::Type{Dual{T}}, x::Number) where {T} = Dual(convert(T, x), zero(T))
113+
Base.promote_rule(::Type{Dual{T}}, ::Type{S}) where {T, S<:Number} = Dual{promote_type(T, S)}
114+
Base.convert(::Type{T}, x::Dual) where {T<:AbstractFloat} = convert(T, x.v)
115+
(::Type{T})(x::Dual) where {T<:AbstractFloat} = T(x.v)
116+
Base.oneunit(::Type{Dual{T}}) where {T} = Dual(oneunit(T), zero(T))
117+
Base.oneunit(a::Dual) = oneunit(typeof(a))
118+
Base.eps(::Type{Dual{T}}) where {T} = eps(T)
119+
Base.eps(a::Dual) = eps(a.v)
120+
Base.floatmin(::Type{Dual{T}}) where {T} = floatmin(T)
121+
Base.float(::Type{Dual{T}}) where {T} = Dual{float(T)}
122+
Base.float(x::Dual{T}) where {T} = Dual(float(x.v), float(x.d))
123+
Base.real(a::Dual) = a
124+
Base.nextfloat(a::Dual) = Dual(nextfloat(a.v), a.d)
125+
Base.isless(a::Dual, b::Real) = isless(a.v, b)
126+
Base.isless(a::Real, b::Dual) = isless(a, b.v)
127+
Base.rtoldefault(::Type{Dual{T}}) where {T} = Base.rtoldefault(T)
128+
```
129+
130+
---
131+
132+
## Part 4: Parameter Estimation via Gradient Descent
133+
134+
Use the solution from Part 1 as your "data". That is, solve the Lotka-Volterra system with the true parameters $p^* = [1.5, 1.0, 3.0, 1.0]$ and save the solution at the same discrete time points your solver outputs. This gives you a reference trajectory with no noise, so a perfect optimizer should be able to recover $p^*$ exactly.
135+
136+
**Tasks:**
137+
138+
1. Define a mean-squared-error loss function:
139+
140+
$$L(p) = \frac{1}{N} \sum_{i=1}^{N} \|u(t_i; p) - u(t_i; p^*)\|^2$$
141+
142+
where $u(t_i; p^*)$ are the reference trajectory values from Part 1 and $u(t_i; p)$ is the ODE solution at time $t_i$ with the current parameter guess $p$.
143+
144+
2. Compute the gradient $\nabla_p L$ using the forward sensitivity equations from Part 2. For each observation time $t_i$ and each parameter $p_j$, use the augmented ODE solution to obtain $\partial u(t_i; p) / \partial p_j$, then apply the chain rule:
145+
146+
$$\frac{\partial L}{\partial p_j} = \frac{2}{N} \sum_{i=1}^{N} \left(u(t_i; p) - u(t_i; p^*)\right)^T \frac{\partial u(t_i; p)}{\partial p_j}$$
147+
148+
3. Starting from the perturbed initial guess $p_0 = [1.2, 0.8, 2.8, 0.8]$, implement a gradient descent loop with a fixed learning rate (a learning rate of $\eta = 0.0002$ works; you may need to experiment). Run for at least 1000 iterations and print the loss periodically (e.g. every 50 iterations).
149+
150+
4. Plot the loss curve over iterations. Plot the final estimated trajectory alongside the reference trajectory and confirm they visually agree. Report the final estimated parameters and verify they are close to $p^*$.
151+
152+
5. Since the data is exact (no noise), the minimum of $L$ is zero and is achieved at $p = p^*$. Does your gradient descent reach a loss near machine epsilon? If not, explain why not and what you would need to do differently to get closer.
153+
154+
**Note:** You are not required to use a sophisticated optimizer. Plain gradient descent is expected here. The point is to see how the sensitivity equations provide exact (up to ODE solver tolerance) gradients that enable optimization, and that when the data is generated by the model itself, the true parameters are recoverable in principle.
155+
156+
---
157+
158+
## Submission
159+
160+
Submit a single Julia script (Pluto notebook), or Jupyter notebook (`.ipynb`) containing all four parts. Each part should include:
161+
162+
- Your code
163+
- Plots where requested
164+
- Written answers to any qualitative questions (as comments or markdown cells)
165+
166+
Make sure your notebook runs end-to-end without errors before submitting.

0 commit comments

Comments
 (0)